What is the issue of select() using so much CPU power?

user180574 picture user180574 · Nov 2, 2013 · Viewed 7k times · Source

I am writing a network communication program using non-blocking sockets (C/C++) and select. The program is pretty big, so I cannot upload source code. In a very aggressive testing session, I use testing code to open and close both TCP and UDP frequently. It always ends up that one end does not respond and has CPU usage over 98 or 99%. Then I use gdb to attach. "bt" shows the following:

0x00007f1b71b59ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
82  ../sysdeps/unix/syscall-template.S: No such file or directory.
    in ../sysdeps/unix/syscall-template.S

What type of error could it be?

$ uname -a
Linux kiosk2 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux

Answer

Jeremy Friesner picture Jeremy Friesner · Nov 2, 2013

It's impossible to say without looking at the code, but often when a select-based loop starts spinning at ~100% CPU usage, it's because one or more of the sockets you told select() to watch are ready-for-read (and/or ready-for-write) so that select() returns right away instead of blocking... but then the code neglects to actually recv() (or send()) any data on that socket. After failing to read/write anything, your event loop would try to go back to sleep by calling select() again, but of course the socket's data (or buffer space, in the ready-for-write case) is still there waiting to be handled, so select() returns immediately again, the buggy code neglects to do the read (or write()) again, and around and around we go at top speed :)

Another possibility would be that you are passing in a timeout value to select() that is either zero or near-zero, causing select() to return very quickly even when no sockets are ready-for-anything... that often happens when people forget to re-initialize the timeval struct before each call to select(). You need to re-initialize the timeval struct each time because some implementations of select() will modify it before returning.

My suggestion is to put some printf's (or your favorite equivalent) immediately before and immediately after your call to select(), and watch that output as you reproduce the fault. That will show you whether the spinning is happening inside of a single call to select(), or if something is causing select() to return immediately over and over again.