Apache, mod_jk and JBoss Load Balancing issue

pugmarx picture pugmarx · Sep 5, 2012 · Viewed 9.4k times · Source

OK, I have been whacking my head in vain since the last three days over this, so time to take it up with the gurus!

Background
I'm trying to get an apparently simple setup working -- I have a CentOS box, on which I want to set-up two JBoss nodes (i.e. Two Nodes on a Single, Non-Multihomed, Server), and try out a fail-over scenario. I read that mod_jk module of Apache would help me achieve it.

I have diligently tried to follow JBoss 5.1 Clustering Guide, but still, somehow, somewhere I seem to have fumbled.

Issue
I'm never able to connect to node2 via Apache, and always get a 503 Service Unavailable message.

Configuration
Following are the details.

CentOS 6
Apache 2.2.15
mod_jk 1.2.35
JBoss 5.1.0 GA

httpd.conf has the following line in the end:

#Include mod_jk's specific configuration file
Include conf/mod_jk.conf

mod_jk.conf contains the following line (amongst other things):

JkMountFile conf/uriworkermap.properties

uriworkermap.properites looks like this:

# Simple worker configuration file
# Mount the Servlet context to the ajp13 worker
/jmx-console=loadbalancer
/jmx-console/*=loadbalancer
/web-console=loadbalancer
/web-console/*=loadbalancer
/jspHello=loadbalancer
/jspHello/*=loadbalancer

workers.properties:

# Define list of workers that will be used
# for mapping requests
worker.list=loadbalancer,status

# Define Node1
# modify the host as your host IP or DNS name.
worker.node1.port=8009
worker.node1.host=127.0.0.1
worker.node1.type=ajp13
worker.node1.lbfactor=1

# Define Node2
# modify the host as your host IP or DNS name.
worker.node2.port=8109
worker.node2.host=127.0.0.1
worker.node2.type=ajp13
worker.node2.lbfactor=1


# Load-balancing behaviour
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=node1,node2
worker.loadbalancer.sticky_session=1

# Status worker for managing load balancer
worker.status.type=status

I use the following commands to invoke node1 and node2:

./run.sh -c node1 -g DocsPartition -Djboss.messaging.ServerPeerID=1 -Djboss.service.binding.set=ports-default -b 0.0.0.0

./run.sh -c node2 -g DocsPartition -Djboss.messaging.ServerPeerID=2 -Djboss.service.binding.set=ports-01 -b 0.0.0.0

Also, I have modified server.xml files within each of the node directories and added the corresponding jvmRoute parameter in Engine tag.

Observation
Following are the mod_jk logs:
Initially when I bring up node1, node2 and httpd

...
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] do_shm_open::jk_shm.c (561): Truncated shared memory to 1856
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] do_shm_open::jk_shm.c (606): Initialized shared memory /var/run/httpd/mod_jk.shm.14695 size=1856 free=1728 addr=0x7fe7cb7c3000
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] do_shm_open_lock::jk_shm.c (476): Opened shared memory lock /var/run/httpd/mod_jk.shm.14695.lock
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'ServerRoot' -> '/etc/httpd'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.list' -> 'loadbalancer,status'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node1.port' -> '8009'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node1.host' -> '127.0.0.1'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node1.type' -> 'ajp13'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node1.lbfactor' -> '1'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node2.port' -> '8109'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node2.host' -> '127.0.0.1'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node2.type' -> 'ajp13'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.node2.lbfactor' -> '1'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.loadbalancer.type' -> 'lb'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.loadbalancer.balance_workers' -> 'node1,node2'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.loadbalancer.sticky_session' -> '0'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_map_dump::jk_map.c (589): Dump of map: 'worker.status.type' -> 'status'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] build_worker_map::jk_worker.c (241): creating worker loadbalancer
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (145): about to create instance loadbalancer of lb
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (158): about to validate and init loadbalancer
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (145): about to create instance node1 of ajp13
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (158): about to validate and init node1
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_validate::jk_ajp_common.c (2679): worker node1 contact is '127.0.0.1:8009'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2877): setting endpoint options:
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2880): keepalive:              0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2884): socket timeout:         0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2888): socket connect timeout: 0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2892): buffer size:            0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2896): pool timeout:           0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2900): ping timeout:           10000
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2904): connect timeout:        0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2908): reply timeout:          0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2912): prepost timeout:        0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2916): recovery options:       0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2920): retries:                2
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2924): max packet size:        8192
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2928): retry interval:         100
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_create_endpoint_cache::jk_ajp_common.c (2737): setting connection pool size to 1 with min 1 and acquire timeout 200
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (145): about to create instance node2 of ajp13
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (158): about to validate and init node2
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_validate::jk_ajp_common.c (2679): worker node2 contact is '127.0.0.1:8109'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2877): setting endpoint options:
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2880): keepalive:              0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2884): socket timeout:         0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2888): socket connect timeout: 0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2892): buffer size:            0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2896): pool timeout:           0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2900): ping timeout:           10000
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2904): connect timeout:        0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2908): reply timeout:          0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2912): prepost timeout:        0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2916): recovery options:       0
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2920): retries:                2
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2924): max packet size:        8192
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_init::jk_ajp_common.c (2928): retry interval:         100
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] ajp_create_endpoint_cache::jk_ajp_common.c (2737): setting connection pool size to 1 with min 1 and acquire timeout 200
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] validate::jk_lb_worker.c (1702): Balanced worker 0 has name node1 and route node1 in domain 
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] validate::jk_lb_worker.c (1702): Balanced worker 1 has name node2 and route node2 in domain 
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] update_mult::jk_lb_worker.c (262): worker node1 gets multiplicity 1
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] update_mult::jk_lb_worker.c (262): worker node2 gets multiplicity 1
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] jk_lb_push::jk_lb_worker.c (353): syncing shm for lb 'loadbalancer' from mem (0->1)
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] build_worker_map::jk_worker.c (241): creating worker status
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (145): about to create instance status of status
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] wc_create_worker::jk_worker.c (158): about to validate and init status
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] init::jk_status.c (5078): Status worker 'status' is read/write and has css '(null)', prefix 'worker', name space 'jk:', xml name space 'xmlns:jk="http://tomcat.apache.org"', document type '(null)'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [debug] init::jk_status.c (5129): Status worker 'status' has good rating for '0000000f' and bad rating for '00ff1010'
[Thu Sep 06 02:21:54 2012][14695:140633527957472] [info] init_jk::mod_jk.c (3365): mod_jk/1.2.35 initialized
...

And when I try to access the application

...
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] find_match::jk_uri_worker_map.c (949): Found a wildchar match '/jspHello/*=loadbalancer'
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_handler::mod_jk.c (2621): Into handler jakarta-servlet worker=loadbalancer r->proxyreq=0
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] wc_get_worker_for_name::jk_worker.c (115): found a worker loadbalancer
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] wc_maintain::jk_worker.c (338): Maintaining worker loadbalancer
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] wc_get_name_for_type::jk_worker.c (292): Found worker type 'lb'
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] init_ws_service::mod_jk.c (1097): Service protocol=HTTP/1.1 method=GET ssl=false host=(null) addr=::1 name=localhost port=80 auth=(null) user=(null) laddr=::1 raddr=::1 uri=/jspHello/
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] service::jk_lb_worker.c (1192): service sticky_session=0 id='empty'
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] get_most_suitable_worker::jk_lb_worker.c (1061): found best worker node2 (node2) using method 'Request'
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] service::jk_lb_worker.c (1237): service worker=node2 route=node2
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_get_endpoint::jk_ajp_common.c (3146): acquired connection pool slot=0 after 0 retries
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_marshal_into_msgb::jk_ajp_common.c (626): ajp marshaling done
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_service::jk_ajp_common.c (2441): processing node2 with 2 retries
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_send_request::jk_ajp_common.c (1615): (node2) all endpoints are disconnected.
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (485): socket TCP_NODELAY set to On
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (609): trying to connect socket 14 to 127.0.0.1:8109
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] jk_open_socket::jk_connect.c (627): connect to 127.0.0.1:8109 failed (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to (127.0.0.1:8109) (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [error] ajp_send_request::jk_ajp_common.c (1621): (node2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] ajp_service::jk_ajp_common.c (2614): (node2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_service::jk_ajp_common.c (2463): retry 1, sleeping for 100 ms before retrying
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_send_request::jk_ajp_common.c (1615): (node2) all endpoints are disconnected.
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (485): socket TCP_NODELAY set to On
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (609): trying to connect socket 14 to 127.0.0.1:8109
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] jk_open_socket::jk_connect.c (627): connect to 127.0.0.1:8109 failed (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to (127.0.0.1:8109) (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [error] ajp_send_request::jk_ajp_common.c (1621): (node2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] ajp_service::jk_ajp_common.c (2614): (node2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [error] ajp_service::jk_ajp_common.c (2634): (node2) connecting to tomcat failed.
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_reset_endpoint::jk_ajp_common.c (786): (node2) resetting endpoint with socket -1 (socket shutdown)
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_done::jk_ajp_common.c (3077): recycling connection pool for worker node2 and socket -1
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] service::jk_lb_worker.c (1455): worker node2 escalating local error to global error
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [info] service::jk_lb_worker.c (1469): service failed, worker node2 is in error state
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] service::jk_lb_worker.c (1480): recoverable error... will try to recover on other worker
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] get_most_suitable_worker::jk_lb_worker.c (1061): found best worker node1 (node1) using method 'Request'
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] service::jk_lb_worker.c (1237): service worker=node1 route=node2
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_get_endpoint::jk_ajp_common.c (3146): acquired connection pool slot=0 after 0 retries
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_marshal_into_msgb::jk_ajp_common.c (626): ajp marshaling done
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_service::jk_ajp_common.c (2441): processing node1 with 2 retries
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_send_request::jk_ajp_common.c (1615): (node1) all endpoints are disconnected.
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (485): socket TCP_NODELAY set to On
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (609): trying to connect socket 14 to 127.0.0.1:8009
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] jk_open_socket::jk_connect.c (635): socket 14 [127.0.0.1:38909 -> 127.0.0.1:8009] connected
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1175): sending to ajp13 pos=4 len=570 max=8192
[Thu Sep 06 02:24:42 2012][14700:140633527957472] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1175): 0000    12 34 02 36 02 02 00 08 48 54 54 50 2F 31 2E 31  - .4.6....HTTP/1.1
...

As you can see, it tries to connect to node2 twice before servicing the request via node1. This is so till the time node1 is up. As soon as I take it down, I start getting the 503 error when I try to access.

I am not sure whether its a mod_jk issue or am I missing something.

Any suggestions?

Update: It's working with two separate (physical) nodes, but I still could not get it to work on the same box with separate ports (8009 and 8109).

Answer

Tony K. picture Tony K. · Sep 5, 2012

Sounds like the JBoss isn't using the port you think it is. A good diagnostic tool is lsof (which is installable via yum);

lsof -i -n -P

and look to see what ports the JBoss process is listening to. Correct your configs from there.