Tomcat's Clustering / Session Replication not replicating properly

nvioli picture nvioli · Sep 16, 2013 · Viewed 28.1k times · Source

I'm setting up clustering/replication on Tomcat 7 on my local machine, to evaluate it for use with my environment/codebase.

Setup

I have two identical tomcat servers in sibling directories running on different ports. I have httpd listening on two other ports and connecting to the two tomcat instances as VirtualHosts. I can access and interact with both environments on the configured ports; everything is working as expected.

The tomcat servers have clustering enabled like this, in server.xml:

   <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
             channelSendOptions="8">

      <Manager className="org.apache.catalina.ha.session.DeltaManager"
               expireSessionsOnShutdown="false"
               notifyListenersOnReplication="true"/>

      <Channel className="org.apache.catalina.tribes.group.GroupChannel">
        <Membership className="org.apache.catalina.tribes.membership.McastService"
                    address="228.0.0.4"
                    port="45564"
                    frequency="500"
                    dropTime="3000"/>
        <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                  address="auto"
                  port="4001"
                  autoBind="100"
                  selectorTimeout="5000"
                  maxThreads="6"/>

        <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
          <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
        </Sender>
        <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
        <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
      </Channel>

      <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
             filter=""/>
      <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

      <Deployer className="org.apache.catalina.ha.deploy.FarmWarDeployer"
                tempDir="/tmp/war-temp/"
                deployDir="/tmp/war-deploy/"
                watchDir="/tmp/war-listen/"
                watchEnabled="false"/>

      <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
      <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
   </Cluster>

and I added the distributable tag to the very beginning of web.xml:

<web-app xmlns="http://java.sun.com/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
  version="3.0">
  <distributable />

  (lots more...)

</web-app>

What's working

When the servers start, they log

Sep 16, 2013 1:44:23 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Sep 16, 2013 1:44:23 PM org.apache.catalina.tribes.transport.ReceiverBase getBind
FINE: Starting replication listener on address:10.0.0.100
Sep 16, 2013 1:44:23 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.0.0.100:4001
Sep 16, 2013 1:44:23 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Sep 16, 2013 1:44:23 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Sep 16, 2013 1:44:24 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Sep 16, 2013 1:44:24 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Sep 16, 2013 1:44:25 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8

When the second server starts up, the first one logs

Sep 16, 2013 2:17:30 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector messageReceived
FINE: Received a failure detector packet:ClusterData[src=org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4000,{10, 0, 0, 100},4000, alive=112208, securePort=-1, UDP Port=-1, id={118 6 107 -67 88 98 72 95 -73 41 4 -108 58 -5 -127 -41 }, payload={}, command={}, domain={}, ]; id={25 110 120 -2 -25 6 78 -97 -84 -34 2 -11 49 -62 -8 -56 }; sent=2013-09-16 14:17:30.139]
Sep 16, 2013 2:17:30 PM org.apache.catalina.tribes.transport.nio.NioReplicationTask remoteEof
FINE: Channel closed on the remote end, disconnecting
Sep 16, 2013 2:17:30 PM org.apache.catalina.tribes.membership.McastServiceImpl memberDataReceived
FINE: Mcast add member org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4001,{10, 0, 0, 100},4001, alive=1010, securePort=-1, UDP Port=-1, id={82 -45 -109 -56 -110 -5 78 -10 -103 61 -40 -59 -36 -79 104 120 }, payload={}, command={}, domain={}, ]
Sep 16, 2013 2:17:30 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4001,{10, 0, 0, 100},4001, alive=1011, securePort=-1, UDP Port=-1, id={82 -45 -109 -56 -110 -5 78 -10 -103 61 -40 -59 -36 -79 104 120 }, payload={}, command={}, domain={}, ]

and when one is shutdown, the other one logs

Sep 16, 2013 2:28:05 PM org.apache.catalina.tribes.membership.McastServiceImpl memberDataReceived
FINE: Member has shutdown:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4001,{10, 0, 0, 100},4001, alive=422279, securePort=-1, UDP Port=-1, id={54 43 17 -9 13 -11 72 -63 -107 -78 -8 65 -21 -77 115 88 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]
Sep 16, 2013 2:28:05 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4001,{10, 0, 0, 100},4001, alive=422279, securePort=-1, UDP Port=-1, id={54 43 17 -9 13 -11 72 -63 -107 -78 -8 65 -21 -77 115 88 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]]
Sep 16, 2013 2:28:05 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 0, 100}:4001,{10, 0, 0, 100},4001, alive=422279, securePort=-1, UDP Port=-1, id={54 43 17 -9 13 -11 72 -63 -107 -78 -8 65 -21 -77 115 88 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]

so I know they're aware of each other.

Finally, when I use the Cluster/Operations MBean in jconsole to try to set property "foo" to "bar", jconsole reports "method successfully invoked", and the server logs

Sep 16, 2013 2:30:18 PM org.apache.catalina.ha.tcp.SimpleTcpCluster setProperty
WARNING: Dynamic setProperty(foo,value) has been disabled, please use explicit properties for the element you are trying to identify

I'm not too worried about that error; mostly included to demonstrate that setProperty creates a log statement.

What's not working

As far as I can tell, no session information is being replicated in my app.

The tomcat manager only lists sessions started on the server it's monitoring, and not the other one in the cluster.

I'm under the impression that whenever the app calls HttpSession.setAttribute, that attribute should be replicated to the other cluster nodes, and I would expect that some record of that would be logged. My app includes this line:

   public static void saveBillingInfo(IPageContext pageContext, BillingInfo billingInfo)
   {    
      pageContext.getSession().setAttribute("billingInfo", billingInfo);
      //etc...
   }

where BillingInfo is a Serializable class containing only one field, a HashMap of information about the billing info.

No log statements are written when this or any other line processes, and I don't see any evidence that session information is actually being shared.

Any suggestions or further questions are welcome.

Answer

Jason picture Jason · Oct 15, 2013

We had this identical issue. Although not documented anywhere, what solved it for me was to simply move the <Manager> tag from server.xml to the global context.xml (bringing it out of the <Server>...<Cluster>... group and into the <Context> group). As soon as we did this, everything "magically" began working. This only applied to Tomcat 7...Tomcat 6 worked perfectly with the setup you describe above (and as the documentation describes).

<Context>
    <Manager className="org.apache.catalina.ha.session.DeltaManager"
             expireSessionsOnShutdown="false"
             notifyListenersOnReplication="true" />
</Context>

Now just remove the <Manager> tag from your Cluster group in server.xml and you're done.