Docker Swarm host cannot resolve hosts on other nodes

simbro picture simbro · Oct 5, 2018 · Viewed 7.2k times · Source

I am following this very excellent tutorial: https://github.com/binblee/springcloud-swarm

When I deploy a stack to a Docker swarm that contains a single node (just the manager node), it works perfectly.

docker stack deploy -c all-in-one.yml springcloud-demo

I have four docker containers, one of them is Eureka service discovery, which all the other three containers register with successfully.

The problem is when I add a worker node to the swarm, then two of the containers will be deployed to the worker, and two to the manager, and the services deployed to the worker node cannot find the Eureka server.

java.net.UnknownHostException: eureka: Name does not resolve

This is my compose file:

version: '3'
services:
  eureka:
    image: demo-eurekaserver
    ports:
      - "8761:8761"

  web:
    image: demo-web
    environment:
      - EUREKA_SERVER_ADDRESS=http://eureka:8761/eureka

  zuul:
    image: demo-zuul
    environment:
      - EUREKA_SERVER_ADDRESS=http://eureka:8761/eureka
    ports:
      - "8762:8762"

  bookservice:
    image: demo-bookservice
    environment:
      - EUREKA_SERVER_ADDRESS=http://eureka:8761/eureka

Also, I can only access the Eureka Service Discovery server on the host on which it is deployed to.

I thought that using "docker stack deploy" automatically creates an overlay network, in which all exposed ports will be routed to a host on which the respective service is running:

From https://docs.docker.com/engine/swarm/ingress/ :

All nodes participate in an ingress routing mesh. The routing mesh enables each node in the swarm to accept connections on published ports for any service running in the swarm, even if there’s no task running on the node.

This is the output of docker service ls:

manager:~/springcloud-swarm/compose$ docker service ls

ID                  NAME                           MODE                REPLICAS            IMAGE                                                  PORTS
rirdysi0j4vk        springcloud-demo_bookservice   replicated          1/1                 demo-bookservice:latest
936ewzxwg82l        springcloud-demo_eureka        replicated          1/1                 demo-eurekaserver:latest   *:8761->8761/tcp
lb1p8nwshnvz        springcloud-demo_web           replicated          1/1                 demo-web:latest
0s52zecjk05q        springcloud-demo_zuul          replicated          1/1                 demo-zuul:latest           *:8762->8762/tcp

and of docker stack ps springcloud-demo:

manager:$ docker stack ps springcloud-demo
ID                  NAME                             IMAGE                      NODE            DESIRED STATE       CURRENT STATE        
o8aed04qcysy        springcloud-demo_web.1           demo-web:latest            workernode      Running             Running 2 minutes ago
yzwmx3l01b94        springcloud-demo_eureka.1        demo-eurekaserver:latest   managernode     Running             Running 2 minutes ago
rwe9y6uj3c73        springcloud-demo_bookservice.1   demo-bookservice:latest    workernode      Running             Running 2 minutes ago
iy5e237ca29o        springcloud-demo_zuul.1          demo-zuul:latest           managernode     Running             Running 2 minutes ago

UPDATE:

I successfully added another host, but now I can't add a third. I tried a couple of times, following the same steps, (installing docker, opening the requisite ports, joining the swarm) - but the node cannot find the Eureka server with the container host name).

UPDATE 2:

In testing that the ports were opened, I examined the firewall config:

workernode:~$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
8080                       ALLOW       Anywhere
4789                       ALLOW       Anywhere
7946                       ALLOW       Anywhere
2377                       ALLOW       Anywhere
8762                       ALLOW       Anywhere
8761                       ALLOW       Anywhere
22                         ALLOW       Anywhere

However - when I try to hit port 2377 on the worker node from the manager node, I can't:

managernode:~$ telnet xx.xx.xx.xx 2377

Trying xx.xx.xx.xx...
telnet: Unable to connect to remote host: Connection refused

Answer

Mani picture Mani · Oct 11, 2018

Let us break the solution into parts. Each part tries to give you an idea about the solution and is interconnected with each other.

Docker container network

Whenever we create a container without specifying network, docker attaches it to default bridge network. According to this,. Service discovery is unavailable in the default network. Hene in order to maker service discovery work properly we are supposed to create a user-defined network as it provides isolation, DNS resolution and many more features. All these things are applicable when we use docker run command.

When docker-compose is used to run a container and network is not specified, it creates its own bridge network. which has all the properties of the user-defined networks.

These bridge networks are not attachable by default, But they allow docker containers in the local machine to connect to them.

Docker swarm network

In Docker swarm and swarm mode routing mesh Whenever we deploy a service to it without specifying an external network it connects to the ingress network.

When you specify an external overlay network you can notice that the created overlay network will be available only to the manager and not in the worker node unless a service is created and is replicated to it. These are also not attachable by default and does not allow other containers outside swarm services to connect to them. So you don't need to declare a network as attachable until you connect a container to it outside swarm.

Docker Swarm

As there is no pre defined/official limit on no of worker/manager nodes, You should be able to connect from the third node. One possibility is that the node might be connected as a worker node but you might try to deploy a container in that node which is restricted by the worker node if the overlay network is not attachable.

And moreover, you can't deploy a service directly in the worker node. All the services are deployed in the manager node and it takes care of replicating and scaling the services based on config and mode provided.

Firewall

As mentioned in Getting started with swarm mode

  • TCP port 2377 for cluster management communications
  • TCP and UDP port 7946 for communication among nodes
  • UDP port 4789 for overlay network traffic
  • ip protocol 50 (ESP) for encrypted overlay network

These ports should be whitelisted for communication between nodes. Most firewalls need to be reloaded once you make changes. This can be done by passing reload option to the firewall and it varies between Linux distributions. ufw doesn't need to be reloaded but needs commit if rules are added in file.

Extra steps to be followed in firewall

Apart from whitelisting the above ports. You may need to whitelist docker0,docker_gw_bridge,br-123456 ip address with netmask of 16. Else service discovery will not work in same host machine. i.e If you are trying to connect to eureka in 192.168.0.12 where the eureka service is in same 192.168.0.12 it will not resolve as firewall will block the traffic. Refer this (NO ROUTE TO HOST network request from container to host-ip:port published from other container)

Java

Sometimes Java works weird such that it throws java.net.MalformedURLException and similar exceptions. I've my own experience of such case with the solution as well. Here ping resolved properly but Java rmi was throwing an error. So, You can define your own custom alias when you attach to a user-defined network.

Docker service discovery

By default, you can resolve to a service by using container name. Apart from that, you can also resolve a service as <container_name>.<network_name>. Of course, you can define alias as well. And even you can resolve it as <alias_name>.<network_name>.

Solution

So you should create a user-defined overlay network after joining the swarm and then should deploy services. In the services, You should mention the external network as defined here along with making changes in the firewall.

If you want to allow external containers to connect to the network you should make the network attachable.

Since you haven't provided enough details on what's happening with third server. I assume that you are trying to deploy a container there which is denied by docker overlay network as the network is not attachable.