Hi All,
Multicasting is the technology of delivering a message to a group of destination nodes (servers/jvms/any other network based software) Means broadcasting the piece of information to all it’s peers. Usually the IP Address range for multicast communication is any IP Address in the range of 224.0.0.0 to 239.255.255.255.
.
Application Server Clustering is one of the best implementation of Multicasting technology. In Middleware world Cluster is a logical entity in which many Member (Servers) will work together to provide LoadBalancing (Load Sharing), Failover (Reliablity) and Scalability to our applications.
.
Cluster Members usually communicates with each othr using following two ways:
1). IP Multicasting (One to Many):
In This technique every node of a cluster broadcast some piece of data/information to all the other members of the same Cluster. Using this techniqueue Servers achieves 2 main goals….
a). Each nodes sends the heartbeat messages to other nodes of the cluster. This makes other node of the cluster aware that the member whoever is sending messages is alive. The heartbeat message broadcasting helps the cluster master to maintain the “Dynamic Server List” (A List of servers who all are alive).
.
b). IP Multicasting techniqueue is also used for the JNDI objects replication among all the members of the cluster. The Object binded in the JNDI tree of a Clustered Node (Server) is broadcasted to rest of the members of the Cluster. It means the JNDI tree of a Clustered Server will be identicle to other members of the same cluster.
2). IP Socketing (One to One):
This technique is broadly used by the middleware cluster members for accessing the object from any other node of the cluster. This technique is actually used by the Cluster members to replicate the HttpSession Data or the EJB Session Objects.
Multicast Errors:
If we observe the multicast errors in the Server Logs …then it means our Cluster is not going to work as expected…one or more node of the clusters may be kicked out of the cluster….The errors will look something like this in the Server Logs:
java.io.OptionalDataException
<Error> <Cluster> <BEA-000110> <Multicast socket receive error: java.io.OptionalDataException java.io.OptionalDataException at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1285) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:322) at weblogic.cluster.MulticastManager.execute(MulticastManager.java:411) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:224) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:183)
java.lang.OutOfMemoryError: PermGen space
<Error> <Cluster> <testWeb> <MS1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <1264189488263> <BEA-000110> <Multicast socket receive error: java.lang.OutOfMemoryError: PermGen space java.lang.OutOfMemoryError: PermGen space at sun.misc.Unsafe.defineClass(Native Method) at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:45) at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:381) at java.security.AccessController.doPrivileged(Native Method) at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:377) at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:95) at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:313) at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1299) at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:52)
java.io.StreamCorruptedException
<Error> <Cluster> <testDomain> <testServer> <ExecuteThread: '14' for queue: 'weblogic.kernel.Default'> <<WLS Kernel>> <BEA-000110> <Multicast socket receive error: java.io.StreamCorruptedException java.io.StreamCorruptedException at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2347) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2380) at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2452) at java.io.DataInputStream.readInt(DataInputStream.java:443) at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2657) at java.io.ObjectInputStream.readInt(ObjectInputStream.java:900) at weblogic.cluster.MulticastManager.execute(MulticastManager.java:387)
java.io.EOFException
<Error> <Cluster> <BEA-000110> <Multicast socket receive error: java.io.EOFException java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:380) at java.io.ObjectInputStream$BlockDataInputStream.readLong()J(Unknown Source) at java.io.ObjectInputStream.readLong()J(Unknown Source) at weblogic.cluster.HeartbeatMessage.readExternal(HeartbeatMessage.java:55) at java.io.ObjectInputStream.readExternalData(Ljava.io.Externalizable;Ljava.io.ObjectStreamClass;)V(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Z)Ljava.lang.Object;(Unknown Source)
java.io.InterruptedIOException
<Error> <Cluster> <Multicast socket receive error : java.io.InterruptedIOException: Receive timed out java.io.InterruptedIOException: Receive timed out at java.net.PlainDatagramSocketImpl.receive(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:90) at java.net.DatagramSocket.receive(DatagramSocket.java:404) at weblogic.cluster.FragmentSocket.receive(FragmentSocket.java:145) at weblogic.cluster.MulticastManager.execute(MulticastManager.java:298) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
Most Possible Causes of the Multicast Errors ?
Cause-1). Incorrect configuration of Multicast Addresses.
.
Cause-2). Less Number of File Descriptors Availability
.
Cause-3). Network Fluctuation/Interrupted Network Connectivity.
.
Cause-4). Multicast Blocking due to some Firewall restrictions. Try disabling the “iptables” as described at the bottom of the page.
.
Cause-5). Multicast Timeouts. Increase the MulticastTTL.
.
Cause-6). More than one Clusters present in the same Network using the same Multicast address.
.
Cause-7). In correct usages of Operating System Zoning or Multihoming Issues. MultiHoming means a Single Physical Box with multiple NIC Cards (Multiple IP Addresses)
.
Cause-8). Servers Listen Port was used as the Multicast Port.
What All Things Need to Debug ?
It is Always recommended to first of all go through the following link: http://download.oracle.com/docs/cd/E12840_01/wls/docs103/cluster/multicast_configuration.html
Point-1). Check & Make Sure that all the Clusters present in the same Network are using Uniqueue Multicast Address And Port
.
Point-2). Using “netstat” or “ping” commands we need to make sure that the Multicast Address and the Port are Ok for use.
.
Point-3). Opening a Socket or a File requires File Descriptors…So we Need to make sure that the required number of “File Descriptors” are available or not?
In Unix based OS we can use “lsof” command (List Of Open Files)
Example : “lsof -p <WLS_PID> | wc -l”
Here WLS_PID is the Process ID of WebLogic. To find WebLogic Process ID please refer to: http://middlewaremagic.com/weblogic/?p=2291
Example: Suppose if the WebLogic Server’s Process ID is 4020 then run the following command:
[jaytest@jaytest bin]$ lsof -p 4020 | wc -l 666 [jaytest@jaytest bin]$ lsof -p 4020 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 4020 jaytest cwd DIR 253,4 4096 8657973 /NotBackedUp/WLS103/user_projects/domains/base_domain java 4020 jaytest rtd DIR 253,1 4096 2 / java 4020 jaytest txt REG 253,3 50810 133154 /home/jaytest/MyJdks/jdk1.6.0_21/bin/java java 4020 jaytest mem REG 253,1 150672 542805 /lib64/ld-2.12.so java 4020 jaytest mem REG 253,1 1838296 542806 /lib64/libc-2.12.so java 4020 jaytest mem REG 253,1 145672 542818 /lib64/libpthread-2.12.so java 4020 jaytest mem REG 253,1 22536 542808 /lib64/libdl-2.12.so java 4020 jaytest mem REG 253,1 598816 542807 /lib64/libm-2.12.so java 4020 jaytest mem REG 253,1 47072 542819 /lib64/librt-2.12.so java 4020 jaytest mem REG 253,1 113904 542814 /lib64/libresolv-2.12.so java 4020 jaytest mem REG 253,1 116136 542840 /lib64/libnsl-2.12.so java 4020 jaytest mem REG 253,3 6676 133795 /home/jaytest/MyJdks/jdk1.6.0_21/jre/lib/amd64/librmi.so java 4020 jaytest mem REG 253,3 1163700 133902 /home/jaytest/MyJdks/jdk1.6.0_21/jre/lib/resources.jar java 4020 jaytest mem REG 253,3 842216 134434 /home/jaytest/MyJdks/jdk1.6.0_21/jre/lib/ext/localedata.jar java 4020 jaytest mem REG 253,4 282279 8653766 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/console.jar java 4020 jaytest mem REG 253,4 293750 8653774 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/standard.jar java 4020 jaytest mem REG 253,4 779658 8653764 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/beehive-netui-core.jar java 4020 jaytest mem REG 253,4 57299 8653775 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/struts-adapter.jar java 4020 jaytest mem REG 253,4 531676 8653767 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/jh.jar java 4020 jaytest mem REG 253,4 1490143 8653770 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/netuix_servlet.jar java 4020 jaytest mem REG 253,4 54683 8653769 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/netuix_common_web.jar java 4020 jaytest mem REG 253,4 46008 8653772 /NotBackedUp/WLS103/wlserver_10.3/server/lib/consoleapp/consolehelp/WEB-INF/lib/render_taglib.jar
.
Point-4). If we see any kind of “Multicast Receive timeout error” It means we need to check the NIC card functioning properly or not.
.
Point-5). There May be Many a Multicast Storm going on in the Network (Storm means repeated transmission of the Multicast packets over the network). In this case we can try increasing the Multicast Buffer Size. Using “udp_max_buf” Parameter we can increase it. Please refer to : http://docs.sun.com/app/docs/doc/816-0607/6m735r5gb?a=view for more details on it.
.
Point-6). In case of Multicast storm the network may be already flooded with the Multicast messages. If we find this then please disable the “igmp” snooping switch. This switch is part of the Internet Group Management Protocol (IGMP) and is used to prevent multicast flood problems on the managed switch.
Example: igmp snooping=disable
For more details on this parameter Please refer to: http://documentation.netgear.com/gs108t/enu/202-10337-01/GS108T_UM-06-21.html
.
Point-7). Set the Multicast Time-To-Live to the following: MulticastTTL=32
NOTE: For WAN kind of larger network the Multicast Time To Live Parameters value must be kept High…sothat the Routers will not discard the Multicast Packets before they reach the Message destination.
.
Point-8). Perform the MulticastMonitor Test & MulticastTest on the network…As described in the following link: http://middlewaremagic.com/weblogic/?p=980
.
Point-9). Try to enable the Cluster Debug to get more details:
java weblogic.Admin -url t3://localhost:7001 - username weblogic -password weblogic SET -type ServerDebug -property DebugCluster true
Point-10). If the Multicast still doesnt work….:) then disable the IPTables….And then check http://kr.forums.oracle.com/forums/thread.jspa?threadID=767088
In RedHat Linux: /etc/init.d/ipdables stop
.
.
Thanks
Jay SenSharma
March 21st, 2011 on 7:22 am
Hi Jay,
Saw your post on 12/21/09:
http://forums.oracle.com/forum/thread.jspa?threadID=1012561
Don’t we still have a chance to test Cluster unicast communication.
Thanks
Sathya
March 21st, 2011 on 4:56 pm
Hi Sathya,
Multicasttest utility is to test the N/W connectivity between the cluster nodes are correct or not (one to one basis).Because in Multicast Mode each and every cluster Member sends their Heart Beat messages to all the other Nodes of the same cluster via the multicast address.
So If the multicast Test results are OK then it means that the N/W connectivity is not an issue on the N/W…. So Thats why we don;t need a separate “Unicast Messaging Test” So even if your Clusters are running using Unicast … You can still test your N/W based on the MulticastTest.
There is NO utility available as of now which can be used to separately test Unicast Messaging.
.
.
Keep Psting 🙂
Thanks
Jay SenSharma