I can't understand polling/select in python

jokoon picture jokoon · Sep 18, 2011 · Viewed 11.2k times · Source

I'm doing some threaded asynchronous networking experiment in python, using UDP.

I'd like to understand polling and the select python module, I've never used them in C/C++.

What are those for ? I kind of understand a little select, but does it block while watching a resource ? What is the purpose of polling ?

Answer

cizixs picture cizixs · Mar 28, 2015

Okay, one question a time.

What are those for?

Here is a simple socket server skeleton:

s_sock = socket.socket()
s_sock.bind()
s_sock.listen()

while True:
    c_sock, c_addr = s_sock.accept()
    process_client_sock(c_sock, c_addr)

Server will loop and accept connection from a client, then call its process function to communicate with client socket. There is a problem here: process_client_sock might takes a long time, or even contains a loop(which is often the case).

def process_client_sock(c_sock, c_addr):
    while True:
        receive_or_send_data(c_sock)

In which case, the server is unable to accept any more connections.

A simple solution would be using multi-process or multi-thread, just create a new thread to deal with request, while the main loop keeps listening on new connections.

s_sock = socket.socket()
s_sock.bind()
s_sock.listen()

while True:
    c_sock, c_addr = s_sock.accept()
    thread = Thread(target=process_client_sock, args=(c_sock, c_addr))
    thread.start()

This works of course, but not well enough considering performance. Because new process/thread takes extra CPU and memory, not idle for servers might get thousands connections.

So select and poll system calls tries to solve this problem. You give select a set of file descriptors and tell it to notify you if any fd is ready to read/write/ or exception happens.

does it(select) block while watching a resource?

Yes, or no depends on the parameter you passed to it.

As select man page says, it will get struct timeval parameter

int select(int nfds, fd_set *readfds, fd_set *writefds,
       fd_set *exceptfds, struct timeval *timeout);

struct timeval {
long    tv_sec;         /* seconds */
long    tv_usec;        /* microseconds */
};

There are three cases:

  1. timeout.tv_sec == 0 and timeout.tv_usec = 0

    No-blocking, return immediately

  2. timeout == NULL

    block forever until a file descriptor is ready.

  3. timeout is normal

    wait for certain time, if still no file descriptor is available, timeout and return.

What is the purpose of polling ?

Put it into simple words: polling frees CPU for other works when waiting for IO.

This is based on the simple facts that

  1. CPU is way more faster than IO
  2. waiting for IO is a waste of time, because for the most time, CPU will be idle

Hope it helps.