Python + ZMQ: Operation cannot be accomplished in current state

QuikProBroNa picture QuikProBroNa · Dec 7, 2016 · Viewed 14.8k times · Source

I am trying to get a python program to communicate with another python program via zeromq by using the request-reply pattern. The client program should send a request to the server program which replies.

I have two servers such that when one server fails the other takes over. Communication works perfect when the first server works, however, when the first server fails and when I make a request to the second server, I see the error:

zmp.error.ZMQError: Operation cannot be accomplished in current state

Code of the server 1:

# Run the server
while True:

    # Define the socket using the "Context"
    sock = context.socket(zmq.REP)
    sock.bind("tcp://127.0.0.1:5677")
    data = sock.recv().decode("utf-8")
    res = "Recvd"
    sock.send(res.encode('utf-8'))

Code of the server 2:

# Run the server
while True:

    # Define the socket using the "Context"
    sock = context.socket(zmq.REP)
    sock.bind("tcp://127.0.0.1:5877")
    data = sock.recv().decode("utf-8")
    res = "Recvd"
    sock.send(res.encode('utf-8'))

Code of client:

# ZeroMQ Context For distributed Message amogst processes
context = zmq.Context()
sock_1 = context.socket(zmq.REQ)
sock_2 = context.socket(zmq.REQ)
sock_1.connect("tcp://127.0.0.1:5677")
sock_2.connect("tcp://127.0.0.1:5877")

try:
    sock_1.send(data.encode('utf-8'), zmq.NOBLOCK)
    socks_1.setsockopt(zmq.RCVTIMEO, 1000)
    socks_1.setsockopt(zmq.LINGER, 0)
    data = socks_1.recv().decode('utf-8') #receive data from the main node  

except:
    try:
        #when server one fails
        sock_2.send(data.encode('utf-8'), zmq.NOBLOCK)
        socks_2.setsockopt(zmq.RCVTIMEO, 1000)
        socks_2.setsockopt(zmq.LINGER, 0)
        data = socks_2.recv().decode('utf-8')
    except Exception as e:
         print(str(e))

What is the problem with this approach? How can I resolve this?

Answer

user3666197 picture user3666197 · Dec 7, 2016

Q: How can I resolve this?
A: Avoid the known risk of REQ/REP deadlocking!

While the ZeroMQ is a powerful framework, understanding its internal composition is necessary for robust and reliable distributed systems design and prototyping.

After a closer look, using a common REQ/REP Formal Communication Pattern may leave ( and does leave ) counter-parties in a mutual dead-lock: where one is expecting the other to do a step, which will be never accomplished, and there is no way to escape from the deadlocked state.

For more illustrated details and FSA-schematic diagram, see this post

Next, a fail-over system has to survive any collisions of its own components. Thus, one has to design well the distributed system state-signalling and avoid as many dependencies on element-FSA-design/stepping/blocking as possible, otherwise, the fail-safe behaviour remains just an illusion.

Always handle resources with care, do not consider components of the ZeroMQ smart-signalling/messaging as any kind of "expendable disposables", doing so might be tolerated in scholar examples, not in production system environments. You still have to pay the costs ( time, resources allocations / de-allocations / garbage-collection(s) ). As noted in comments, never let resources creation/allocation without a due control. while True: .socket(); .bind(); .send(); is brutally wrong in principle and deteriorating the rest of the design.