With JRockit not seeing a JDK7 version, we might want to change a current WebLogic environment to use Oracle JDK. Let us use the Garbage-First (G1) Collector, which is fully supported in Oracle JDK 7 update 4 and later releases.

When using the G1 collector, the heap is partitioned into a set of equal-sized heap regions, each a contiguous range of virtual memory. Certain region sets are assigned the same roles (eden, survivor, old) as in the older collectors, but there is not a fixed size for them. This provides greater flexibility in memory usage. When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the mark phase completes, G1 knows which regions are mostly empty. It collects in these regions first, which usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First. As the name suggests, G1 concentrates its collection and compaction activity on the areas of the heap that are likely to be full of reclaimable objects, that is, garbage. G1 uses a pause prediction model to meet a user-defined pause time target and selects the number of regions to collect based on the specified pause time target.

The regions identified by G1 as ripe for reclamation are garbage collected using evacuation. G1 copies objects from one or more regions of the heap to a single region on the heap, and in the process both compacts and frees up memory. This evacuation is performed in parallel on multi-processors, to decrease pause times and increase throughput. Thus, with each garbage collection, G1 continuously works to reduce fragmentation, working within the user defined pause times. This is beyond the capability of both the previous methods. CMS (Concurrent Mark Sweep ) garbage collector does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

It is important to note that G1 is not a real-time collector. It meets the set pause time target with high probability but not absolute certainty. Based on data from previous collections, G1 does an estimate of how many regions can be collected within the user specified target time. Thus, the collector has a reasonably accurate model of the cost of collecting the regions, and it uses this model to determine which and how many regions to collect while staying within the pause time target.

To change a WebLogic installation to use another JDK, we can adjust the setDomainEnv file as follows (i.e., add SUN_JAVA_HOME, JAVA_VENDOR and USER_MEM_ARGS), for example,

#!/bin/sh

...

WL_HOME="/home/weblogic/weblogic12.1.1/installation/wlserver_12.1"
export WL_HOME

BEA_JAVA_HOME="/home/weblogic/jrockit-jdk1.6.0_29-R28.2.2-4.1.0"
export BEA_JAVA_HOME

SUN_JAVA_HOME="/home/weblogic/jdk1.7.0_11"
export SUN_JAVA_HOME

JAVA_VENDOR="Sun"
export JAVA_VENDOR

if [ "${JAVA_VENDOR}" = "Oracle" ] ; then
	JAVA_HOME="${BEA_JAVA_HOME}"
	export JAVA_HOME
else
	if [ "${JAVA_VENDOR}" = "Sun" ] ; then
		JAVA_HOME="${SUN_JAVA_HOME}"
		export JAVA_HOME
	else
		JAVA_VENDOR="Oracle"
		export JAVA_VENDOR
		JAVA_HOME="/home/weblogic/jrockit-jdk1.6.0_29-R28.2.2-4.1.0"
		export JAVA_HOME
	fi
fi

...

if [ "${SERVER_NAME}" = "" ] ; then
	SERVER_NAME="AdminServer"
	export SERVER_NAME
fi

if [ "${SERVER_NAME}" = "AdminServer" ] ; then
	if [ "${JAVA_VENDOR}" = "Sun" ] ; then
		# Memory arguments for JDK
		USER_MEM_ARGS="-server -Xms512m -Xmx512m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:G1HeapRegionSize=2048k -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages"
		export USER_MEM_ARGS
	fi

	if [ "${JAVA_VENDOR}" = "Oracle" ] ; then
		# JMX configuration for JRockit
		JMX_OPTIONS="-Djava.rmi.server.hostname=192.168.1.150 -Xmanagement:ssl=false,authenticate=false,port=7091,autodiscovery=true"
		export JMX_OPTIONS
		# Memory arguments for JRockit
		USER_MEM_ARGS="-jrockit -Xms512m -Xmx512m -Xns128m -XXkeepAreaRatio:25 -Xgc:pausetime -XpauseTarget:200ms -XX:+UseCallProfiling -XX:+UseLargePagesForHeap ${JMX_OPTIONS}"
		export USER_MEM_ARGS
	fi
fi

To prevent the following error

Exception in thread "[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'"
Exception in thread "Timer-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Timer-1"
***************************************************************************
The WebLogic Server encountered a critical failure
Reason: PermGen space
***************************************************************************
<Jan 30, 2013 4:24:14 PM CET> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN.>
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

set the perm size, to values that are large enough (in the example 256Mb was enough in order to start WebLogic, these values might change when we are dealing with large deployments). When we want to use the node manager to start and stop servers we have to adjust the nodemanager.properties and set the JAVA_HOME variables accordingly, for example,

...
javaHome=/home/weblogic/jdk1.7.0_11
...
JavaHome=/home/weblogic/jdk1.7.0_11/jre
...
StartScriptEnabled=true
...

To set up an environment that has an application that uses Coherence deployed on WebLogic, we create a WebLogic managed server that has the following settings

<server>
	<name>security-server</name>
    <ssl>
		<enabled>false</enabled>
    </ssl>
    <machine>machine1</machine>
    <listen-port>8001</listen-port>
    <cluster xsi:nil="true"></cluster>
    <web-server>
		<web-server-log>
			<number-of-files-limited>false</number-of-files-limited>
		</web-server-log>
    </web-server>
    <listen-address>middleware-magic.com</listen-address>
    <server-start>
		<java-vendor>Sun</java-vendor>
		<java-home>/home/weblogic/jdk1.7.0_11</java-home>
		<arguments>-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:G1HeapRegionSize=2048k -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages -Dtangosol.coherence.mode=prod -Dtangosol.coherence.distributed.localstorage=false</arguments>
    </server-start>
</server>

and a Coherence cache server that has the following settings

<coherence-server>
    <name>coherence-server1</name>
    <machine>machine1</machine>
    <coherence-cluster-system-resource xsi:nil="true"></coherence-cluster-system-resource>
    <unicast-listen-address>localhost</unicast-listen-address>
    <unicast-listen-port>8088</unicast-listen-port>
    <unicast-port-auto-adjust>true</unicast-port-auto-adjust>
    <coherence-server-start>
		<java-vendor>Sun</java-vendor>
		<java-home>/home/weblogic/jdk1.7.0_11</java-home>
		<class-path>/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar:/home/weblogic/weblogic12.1.1/installation/modules/features/weblogic.server.modules.coherence.server_12.1.1.0.jar:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar</class-path>
		<arguments>-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:G1HeapRegionSize=2048k -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages -Dtangosol.coherence.mode=prod -Dtangosol.coherence.cacheconfig=security-cache-config.xml -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true</arguments>
    </coherence-server-start>
</coherence-server>
  • -server – select the JIT compiler.
  • -Xms – initial heap size.
  • -Xmx – maximum heap size.
  • -XX:PermSize and -XX:MaxPermSize – size of the permanent generation.
  • -XX:NewRatio=N – sets the young generation to heap size / (1 + N).
  • -XX:SurvivorRatio – ratio of eden/survivor space size.
  • -XX:MaxTenuringThreshold – sets the maximum tenuring threshold for use in adaptive GC sizing (above we assume that objects that are not collected in the eden space are objects related to the Coherence cache, and are tenured to the old space as these objects will live for a long time).
  • -XX:+UseG1GC – select the G1 collector.
  • -XX:MaxGCPauseMillis – sets the maximum pause time goal.
  • -XX:InitiatingHeapOccupancyPercent – percentage of the (entire) heap occupancy to start a concurrent GC cycle. It is used by the G1 collector to trigger a concurrent GC cycle based on the occupancy of the entire heap, not just one of the generations.
  • -XX:G1HeapRegionSize – sets the size of the uniformly sized regions. We set this equal to the large page size.
  • -XX:+UseTLAB – enables thread-local object allocation. More information on thread local allocation can be found in the ‘Compaction and Thread Local Area’ section of the Tune the JVM that runs Coherence post.
  • -XX:LargePageSizeInBytes – sets the large page size used for the Java heap. We set this equal to the operating system parameter: Hugepagesize, which in our case is 2048kB
  • -XX:+UseLargePages – use large page memory. The steps involved on how to configure large pages in the operating system can be found in the ‘Call profiling and large pages’ section of the Tune the JVM that runs Coherence post.

Testing

To test the Coherence part of the set-up we will perform a load test, by using the following scripts

#!/bin/sh

# coherence options
#JMX_OPTIONS="-Dcom.sun.management.jmxremote.port=7091 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
#JMX_OPTIONS="-Djava.rmi.server.hostname=192.168.1.150 -Xmanagement:ssl=false,authenticate=false,port=7091,autodiscovery=true"
COHERENCE_MANAGEMENT_OPTIONS="-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
COHERENCE_OPTIONS="-Dtangosol.coherence.mode=prod -Dtangosol.coherence.cacheconfig=security-cache-config.xml ${COHERENCE_MANAGEMENT_OPTIONS} ${JMX_OPTIONS}"
export COHERENCE_OPTIONS

JHICCUP_HOME="/home/weblogic/temp/jHiccup"
export JHICCUP_HOME

JAVA_HOME="/home/weblogic/jdk1.7.0_11"
#JAVA_HOME="/home/weblogic/jrockit-jdk1.6.0_29-R28.2.2-4.1.0"
export JAVA_HOME

MEM_ARGS="-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:G1HeapRegionSize=2048k -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages"
#MEM_ARGS="-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseParallelGC -XX:MaxGCPauseMillis=200 -XX:+UseParallelOldGC -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages"
#MEM_ARGS="-jrockit -Xms1024m -Xmx1024m -Xgc:pausetime -XpauseTarget=200m -XX:+UseCallProfiling -XX:+UseLargePagesForHeap"
export MEM_ARGS

CLASSPATH="/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar"
export CLASSPATH

# start the test
${JAVA_HOME}/bin/java ${MEM_ARGS} ${COHERENCE_OPTIONS} com.tangosol.net.DefaultCacheServer
#$JHICCUP_HOME/jHiccup ${JAVA_HOME}/bin/java ${MEM_ARGS} ${COHERENCE_OPTIONS} com.tangosol.net.DefaultCacheServer

and

#!/bin/sh
# coherence options
COHERENCE_MANAGEMENT_OPTIONS="-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
COHERENCE_OPTIONS="-Dtangosol.coherence.mode=prod -Dtangosol.coherence.cacheconfig=security-cache-config.xml -Dtangosol.coherence.distributed.localstorage=false"
export COHERENCE_OPTIONS

#MEASUREMENT_OPTIONS="-verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/home/weblogic/temp/gc.out"
#export MEASUREMENT_OPTIONS

JHICCUP_HOME="/home/weblogic/temp/jHiccup"
export JHICCUP_HOME

JAVA_HOME="/home/weblogic/jdk1.7.0_11"
#JAVA_HOME="/home/weblogic/jrockit-jdk1.6.0_29-R28.2.2-4.1.0"
export JAVA_HOME

MEM_ARGS="-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:G1HeapRegionSize=2048k -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages"
#MEM_ARGS="-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseParallelGC -XX:MaxGCPauseMillis=200 -XX:+UseParallelOldGC -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages"
#MEM_ARGS="-jrockit -Xms1024m -Xmx1024m -Xgc:pausetime -XpauseTarget=200m -XX:+UseCallProfiling -XX:+UseLargePagesForHeap"
export MEM_ARGS

CLASSPATH="/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar"
export CLASSPATH

# start the test
${JAVA_HOME}/bin/java ${MEM_ARGS} ${COHERENCE_OPTIONS} ${MEASUREMENT_OPTIONS} model.test.Test
#$JHICCUP_HOME/jHiccup ${JAVA_HOME}/bin/java ${MEM_ARGS} ${COHERENCE_OPTIONS} ${MEASUREMENT_OPTIONS} model.test.Test

The first script runs a cache server, the second a cache client (which inserts, updates, deletes and obtains data from the cache)

[weblogic@middleware-magic temp]$ ./default-cache-server.sh
2013-02-08 13:18:23.816/0.518 Oracle Coherence 3.7.1.1 <Info> (thread=main, member=n/a): Loaded operational configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/tangosol-coherence.xml"
2013-02-08 13:18:23.911/0.613 Oracle Coherence 3.7.1.1 <Info> (thread=main, member=n/a): Loaded operational overrides from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/tangosol-coherence-override-prod.xml"
2013-02-08 13:18:24.003/0.705 Oracle Coherence 3.7.1.1 <Info> (thread=main, member=n/a): Loaded operational overrides from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/tangosol-coherence-override.xml"
2013-02-08 13:18:24.010/0.712 Oracle Coherence 3.7.1.1 <D5> (thread=main, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 3.7.1.1 Build 28901
 Grid Edition: Production mode
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

2013-02-08 13:18:24.434/1.136 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=n/a): Loaded cache configuration from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/security-cache-config.xml"
2013-02-08 13:18:24.569/1.271 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=n/a): Loaded Reporter configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/reports/report-group.xml"
2013-02-08 13:18:25.047/1.749 Oracle Coherence GE 3.7.1.1 <D4> (thread=main, member=n/a): TCMP bound to /192.168.1.150:8088 using SystemSocketProvider
2013-02-08 13:18:55.444/32.146 Oracle Coherence GE 3.7.1.1 <Info> (thread=Cluster, member=n/a): Created a new cluster "SecurityCoherenceCluster" with Member(Id=1, Timestamp=2013-02-08 13:18:25.091, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:8514, Role=CoherenceServer, Edition=Grid Edition, Mode=Production, CpuCount=4, SocketCount=1) UID=0xC0A801960000013CB9BE0AC3FE391F98
2013-02-08 13:18:55.459/32.161 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=n/a): Started cluster Name=SecurityCoherenceCluster

WellKnownAddressList(Size=1,
  WKA{Address=192.168.1.150, Port=8088}
  )

MasterMemberSet(
  ThisMember=Member(Id=1, Timestamp=2013-02-08 13:18:25.091, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:8514, Role=CoherenceServer)
  OldestMember=Member(Id=1, Timestamp=2013-02-08 13:18:25.091, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:8514, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=1
    Member(Id=1, Timestamp=2013-02-08 13:18:25.091, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:8514, Role=CoherenceServer)
    )
  MemberId|ServiceVersion|ServiceJoined|MemberState
    1|3.7.1|2013-02-08 13:18:55.444|JOINED
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

TcpRing{Connections=[]}
IpMonitor{AddressListSize=0}

2013-02-08 13:18:55.530/32.232 Oracle Coherence GE 3.7.1.1 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
2013-02-08 13:18:55.759/32.461 Oracle Coherence GE 3.7.1.1 <Info> (thread=DistributedCache, member=1): Loaded POF configuration from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/security-pof-config.xml"
2013-02-08 13:18:55.788/32.490 Oracle Coherence GE 3.7.1.1 <Info> (thread=DistributedCache, member=1): Loaded included POF configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/coherence-pof-config.xml"
2013-02-08 13:18:55.850/32.552 Oracle Coherence GE 3.7.1.1 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
2013-02-08 13:18:55.881/32.583 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=1):
Services
  (
  ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_JOINED), Id=0, Version=3.7.1, OldestMemberId=1}
  InvocationService{Name=Management, State=(SERVICE_STARTED), Id=1, Version=3.1, OldestMemberId=1}
  PartitionedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=257, BackupPartitions=0}
  )

Started DefaultCacheServer...

2013-02-08 13:19:19.525/56.227 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest) joined Cluster with senior member 1
2013-02-08 13:19:19.612/56.314 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 joined Service Management with senior member 1
2013-02-08 13:19:20.038/56.740 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
2013-02-08 13:20:56.110/152.812 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1125 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 75 packets rescheduled, PauseRate=0.0116, Threshold=4096
2013-02-08 13:20:58.125/154.827 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1154 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 45 packets rescheduled, PauseRate=0.0231, Threshold=3892
2013-02-08 13:27:11.534/528.236 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1784 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 1605 packets rescheduled, PauseRate=0.014, Threshold=3882
2013-02-08 13:27:12.915/529.617 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1166 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 215 packets rescheduled, PauseRate=0.0164, Threshold=3688
2013-02-08 13:27:47.460/564.162 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3083 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 45 packets rescheduled, PauseRate=0.0214, Threshold=4096
2013-02-08 13:29:33.054/669.756 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1457 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4118 packets rescheduled, PauseRate=0.0207, Threshold=4096
2013-02-08 13:29:58.610/695.312 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1099 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 1244 packets rescheduled, PauseRate=0.0219, Threshold=3505
2013-02-08 13:32:54.405/871.107 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3079 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 261 packets rescheduled, PauseRate=0.0226, Threshold=2713
2013-02-08 13:39:45.109/1281.811 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1001 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4116 packets rescheduled, PauseRate=0.0172, Threshold=4096
2013-02-08 13:41:05.814/1362.516 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2553 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 3918 packets rescheduled, PauseRate=0.0185, Threshold=4096
2013-02-08 13:41:07.168/1363.870 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1146 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 2920 packets rescheduled, PauseRate=0.0194, Threshold=4086
2013-02-08 13:41:20.716/1377.418 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2990 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 111 packets rescheduled, PauseRate=0.0214, Threshold=3505
2013-02-08 13:41:58.747/1415.449 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3029 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4126 packets rescheduled, PauseRate=0.0231, Threshold=4096
2013-02-08 13:42:00.268/1416.970 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1284 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4117 packets rescheduled, PauseRate=0.024, Threshold=4096
2013-02-08 13:47:57.581/1774.283 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3942 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 106 packets rescheduled, PauseRate=0.0215, Threshold=3854
2013-02-08 13:49:12.154/1848.856 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2733 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 148 packets rescheduled, PauseRate=0.0223, Threshold=3505
2013-02-08 13:50:15.032/1911.734 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3392 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 190 packets rescheduled, PauseRate=0.0235, Threshold=4096
2013-02-08 13:51:00.063/1956.765 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2918 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4052 packets rescheduled, PauseRate=0.0247, Threshold=4023
2013-02-08 13:51:01.733/1958.435 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1457 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 3890 packets rescheduled, PauseRate=0.0255, Threshold=4096
2013-02-08 13:52:41.079/2057.781 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2750 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 3726 packets rescheduled, PauseRate=0.0261, Threshold=3698
2013-02-08 14:08:09.516/2986.218 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1063 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 1753 packets rescheduled, PauseRate=0.0229, Threshold=4086
2013-02-08 14:10:04.268/3100.970 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1131 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4106 packets rescheduled, PauseRate=0.0227, Threshold=4086
2013-02-08 14:10:13.452/3110.154 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1010 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4115 packets rescheduled, PauseRate=0.023, Threshold=4096
2013-02-08 14:10:14.833/3111.535 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1139 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4106 packets rescheduled, PauseRate=0.0233, Threshold=4086
2013-02-08 14:10:31.388/3128.090 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1010 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 2878 packets rescheduled, PauseRate=0.0235, Threshold=2858
2013-02-08 14:10:32.819/3129.521 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1206 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 2872 packets rescheduled, PauseRate=0.0239, Threshold=2851
2013-02-08 14:10:43.136/3139.838 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1135 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 163 packets rescheduled, PauseRate=0.0245, Threshold=3283
2013-02-08 14:11:36.859/3193.561 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 4313 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 177 packets rescheduled, PauseRate=0.0254, Threshold=4096
2013-02-08 14:27:48.746/4165.448 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 2824 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4115 packets rescheduled, PauseRate=0.0223, Threshold=4086
2013-02-08 14:27:50.454/4167.156 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1490 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4098 packets rescheduled, PauseRate=0.0227, Threshold=4076
2013-02-08 14:28:39.448/4216.150 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3848 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 86 packets rescheduled, PauseRate=0.0233, Threshold=4096
2013-02-08 14:29:44.679/4281.381 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 4360 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 199 packets rescheduled, PauseRate=0.0242, Threshold=4096
2013-02-08 14:30:35.106/4331.808 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 3074 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 4126 packets rescheduled, PauseRate=0.0247, Threshold=4096
2013-02-08 14:30:36.794/4333.496 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 1455 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 3904 packets rescheduled, PauseRate=0.0251, Threshold=3882
2013-02-08 14:31:46.922/4403.624 Oracle Coherence GE 3.7.1.1 <Warning> (thread=PacketPublisher, member=1): Experienced a 4512 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest); 142 packets rescheduled, PauseRate=0.0257, Threshold=4086
2013-02-08 14:32:32.260/4448.962 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): TcpRing disconnected from Member(Id=2, Timestamp=2013-02-08 13:19:19.335, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest) due to a peer departure; removing the member.
2013-02-08 14:32:32.260/4448.962 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 left service Management with senior member 1
2013-02-08 14:32:32.260/4448.962 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 left service DistributedCache with senior member 1
2013-02-08 14:32:32.260/4448.962 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-02-08 14:32:32.26, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:8610, Role=ModelTestTest) left Cluster with senior member 1

What is worrying are the ‘Experienced a ….ms communication delay’ messages that appear frequently, and also the reason behind it (i.e., probable remote GC). To see what is happening on the operating system (RAM and CPU) we can use vmstat and mpstat

[weblogic@middleware-magic ~]$ vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 4374164 162896 695112    0    0    14   146  443  602 28  3 68  1  0
[weblogic@middleware-magic ~]$ mpstat -P ALL
Linux 2.6.32-279.22.1.el6.x86_64 (middleware-magic.com) 	02/08/2013 	_x86_64_	(4 CPU)

02:33:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
02:33:15 PM  all   28.09    0.29    2.44    0.83    0.01    0.12    0.00    0.00   68.23
02:33:15 PM    0   27.13    0.42    2.53    0.81    0.00    0.12    0.00    0.00   68.98
02:33:15 PM    1   29.86    0.30    2.33    0.81    0.00    0.11    0.00    0.00   66.57
02:33:15 PM    2   27.70    0.21    2.55    0.91    0.00    0.13    0.00    0.00   68.49
02:33:15 PM    3   27.66    0.23    2.35    0.77    0.01    0.12    0.00    0.00   68.87

Now let us see if the communication is really some problem that involves garbage collection (stop-the-world) times. We run the test again, now with JRockit (and also set a pausetime goal of 200msec, just as was done with the G1 collector). The following shows the output of the cache server

[weblogic@middleware-magic temp]$ ./default-cache-server.sh
2013-02-08 11:24:19.549/2.676 Oracle Coherence 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/tangosol-coherence.xml"
2013-02-08 11:24:19.577/2.704 Oracle Coherence 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/tangosol-coherence-override-prod.xml"
2013-02-08 11:24:19.595/2.722 Oracle Coherence 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/tangosol-coherence-override.xml"
2013-02-08 11:24:19.601/2.728 Oracle Coherence 3.7.1.1 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 3.7.1.1 Build 28901
 Grid Edition: Production mode
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

2013-02-08 11:24:19.916/3.043 Oracle Coherence GE 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/security-cache-config.xml"
2013-02-08 11:24:20.393/3.520 Oracle Coherence GE 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Loaded Reporter configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/reports/report-group.xml"
2013-02-08 11:24:21.262/4.389 Oracle Coherence GE 3.7.1.1 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.1.150:8088 using SystemSocketProvider
2013-02-08 11:24:51.778/34.905 Oracle Coherence GE 3.7.1.1 <Info> (thread=Cluster, member=n/a): Created a new cluster "SecurityCoherenceCluster" with Member(Id=1, Timestamp=2013-02-08 11:24:21.331, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:7376, Role=CoherenceServer, Edition=Grid Edition, Mode=Production, CpuCount=4, SocketCount=1) UID=0xC0A801960000013CB9559D53FE391F98
2013-02-08 11:24:51.816/34.943 Oracle Coherence GE 3.7.1.1 <Info> (thread=Main Thread, member=n/a): Started cluster Name=SecurityCoherenceCluster

WellKnownAddressList(Size=1,
  WKA{Address=192.168.1.150, Port=8088}
  )

MasterMemberSet(
  ThisMember=Member(Id=1, Timestamp=2013-02-08 11:24:21.331, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:7376, Role=CoherenceServer)
  OldestMember=Member(Id=1, Timestamp=2013-02-08 11:24:21.331, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:7376, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=1
    Member(Id=1, Timestamp=2013-02-08 11:24:21.331, Address=192.168.1.150:8088, MachineId=65081, Location=site:,machine:middleware-magic,process:7376, Role=CoherenceServer)
    )
  MemberId|ServiceVersion|ServiceJoined|MemberState
    1|3.7.1|2013-02-08 11:24:51.783|JOINED
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

TcpRing{Connections=[]}
IpMonitor{AddressListSize=0}

2013-02-08 11:24:51.944/35.071 Oracle Coherence GE 3.7.1.1 <D5> (thread=Invocation:Management, member=1): Service Management joined the cluster with senior service member 1
2013-02-08 11:24:52.255/35.382 Oracle Coherence GE 3.7.1.1 <Info> (thread=DistributedCache, member=1): Loaded POF configuration from "jar:file:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar!/security-pof-config.xml"
2013-02-08 11:24:52.265/35.392 Oracle Coherence GE 3.7.1.1 <Info> (thread=DistributedCache, member=1): Loaded included POF configuration from "jar:file:/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar!/coherence-pof-config.xml"
2013-02-08 11:24:52.303/35.430 Oracle Coherence GE 3.7.1.1 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
2013-02-08 11:24:52.375/35.503 Oracle Coherence GE 3.7.1.1 <Info> (thread=Main Thread, member=1):
Services
  (
  ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_JOINED), Id=0, Version=3.7.1, OldestMemberId=1}
  InvocationService{Name=Management, State=(SERVICE_STARTED), Id=1, Version=3.1, OldestMemberId=1}
  PartitionedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=257, BackupPartitions=0}
  )

Started DefaultCacheServer...

2013-02-08 11:26:21.422/124.550 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-02-08 11:26:21.174, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:7442, Role=ModelTestTest) joined Cluster with senior member 1
2013-02-08 11:26:21.716/124.843 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 joined Service Management with senior member 1
2013-02-08 11:26:22.150/125.277 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 joined Service DistributedCache with senior member 1
[INFO ][mgmnt  ] Local JMX connector started
2013-02-08 13:07:05.135/6168.262 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): TcpRing disconnected from Member(Id=2, Timestamp=2013-02-08 11:26:21.174, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:7442, Role=ModelTestTest) due to a peer departure; removing the member.
2013-02-08 13:07:05.139/6168.266 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 left service Management with senior member 1
2013-02-08 13:07:05.140/6168.267 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member 2 left service DistributedCache with senior member 1
2013-02-08 13:07:05.144/6168.271 Oracle Coherence GE 3.7.1.1 <D5> (thread=Cluster, member=1): Member(Id=2, Timestamp=2013-02-08 13:07:05.141, Address=192.168.1.150:8090, MachineId=65081, Location=site:,machine:middleware-magic,process:7442, Role=ModelTestTest) left Cluster with senior member 1

Luckily the communication issues are gone. To see what JRockit does on the operating system we again use vmstat and pmstat

[weblogic@middleware-magic ~]$ vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 5393884 155152 694208    0    0    20   209  715   26 29  2 68  1  0
[weblogic@middleware-magic ~]$ mpstat -P ALL
Linux 2.6.32-279.22.1.el6.x86_64 (middleware-magic.com) 	02/08/2013 	_x86_64_	(4 CPU)

01:08:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
01:08:01 PM  all   28.52    0.41    2.22    1.16    0.01    0.13    0.00    0.00   67.55
01:08:01 PM    0   27.03    0.60    2.38    1.15    0.01    0.13    0.00    0.00   68.71
01:08:01 PM    1   31.10    0.41    2.20    1.14    0.00    0.11    0.00    0.00   65.03
01:08:01 PM    2   28.10    0.31    2.28    1.28    0.00    0.14    0.00    0.00   67.89
01:08:01 PM    3   27.86    0.33    2.02    1.08    0.01    0.13    0.00    0.00   68.57

A remark is in order; the ‘Experienced a …ms communication delay’ messages also do not appear when the parallel collector (-XX:+UseParallelGC) settings are used, which can also be seen in the following measurements

Note the difference in garbage collection times between the G1 collector (the first two) and the parallel collector (the last two). The load test brakes about everything the G1 collector is designed for:

  • Operate concurrently with applications threads like the CMS collector.
  • Compact free space without lengthy GC induced pause times.
  • Need more predictable GC pause durations.
  • Do not want to sacrifice a lot of throughput performance.
  • Do not require a much larger Java heap.

Based on the measurements we will resort to the parallel collector and change the JVM parameters to the following for the WebLogic managed server

<server>
    <name>security-server</name>
    <ssl>
		<enabled>false</enabled>
    </ssl>
    <machine>machine1</machine>
    <listen-port>8001</listen-port>
    <cluster xsi:nil="true"></cluster>
    <web-server>
		<web-server-log>
			<number-of-files-limited>false</number-of-files-limited>
		</web-server-log>
    </web-server>
    <listen-address>middleware-magic.com</listen-address>
    <server-start>
		<java-vendor>Sun</java-vendor>
		<java-home>/home/weblogic/jdk1.7.0_11</java-home>
		<arguments>-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseParallelGC -XX:MaxGCPauseMillis=200 -XX:+UseParallelOldGC -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages -Dtangosol.coherence.mode=prod -Dtangosol.coherence.distributed.localstorage=false</arguments>
    </server-start>
  </server>

and to the following for the Coherence cache server

<coherence-server>
    <name>coherence-server1</name>
    <machine>machine1</machine>
    <coherence-cluster-system-resource xsi:nil="true"></coherence-cluster-system-resource>
    <unicast-listen-address>localhost</unicast-listen-address>
    <unicast-listen-port>8088</unicast-listen-port>
    <unicast-port-auto-adjust>true</unicast-port-auto-adjust>
    <coherence-server-start>
		<java-vendor>Sun</java-vendor>
		<java-home>/home/weblogic/jdk1.7.0_11</java-home>
		<class-path>/home/weblogic/weblogic12.1.1/installation/coherence_3.7/lib/coherence.jar:/home/weblogic/weblogic12.1.1/installation/modules/features/weblogic.server.modules.coherence.server_12.1.1.0.jar:/home/weblogic/weblogic12.1.1/configuration/applications/base_domain/security/test.jar</class-path>
		<arguments>-server -Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=128 -XX:MaxTenuringThreshold=0 -XX:+UseParallelGC -XX:MaxGCPauseMillis=200 -XX:+UseParallelOldGC -XX:+UseTLAB -XX:LargePageSizeInBytes=2048k -XX:+UseLargePages -Dtangosol.coherence.mode=prod -Dtangosol.coherence.cacheconfig=security-cache-config.xml -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true</arguments>
    </coherence-server-start>
  </coherence-server>

References

[1] Getting Started with the G1 Garbage Collector.
[2] Java Garbage Collection Basics.
[3] jHiccup: Open Source Tool to Measure Variations in Java Performance.