How to monitor Linux UDP buffer available space?

Yoni Roit picture Yoni Roit · Feb 18, 2010 · Viewed 89.2k times · Source

I have a java app on linux which opens UDP socket and waits for messages.

After couple of hours under heavy load, there is a packet loss, i.e. the packets are received by kernel but not by my app (we see the lost packets in sniffer, we see UDP packets lost in netstat, we don't see those packets in our app logs).

We tried enlarging socket buffers but this didnt help - we started losing packets later then before, but that's it.

For debugging, I want to know how full the OS udp buffer is, at any given moment. Googled, but didn't find anything. Can you help me?

P.S. Guys, I'm aware that UDP is unreliable. However - my computer receives all UDP messages, while my app is unable to consume some of them. I want to optimize my app to the max, that's the reason for the question. Thanks.

Answer

RickS picture RickS · Aug 19, 2013

UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!

If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.

This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.

As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I cat the /proc/net/udp node, I get something like this:

$ cat /proc/net/udp   
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
  40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3466 2 ffff88013abc8340 0           
  67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000  1006        0 16940862 2 ffff88013abc9040 2237    
 122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000  1006        0 912865 2 ffff88013abc8d00 0         

From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.

You could watch your program's behaviour over time e.g. using:

watch -d 'cat /proc/net/udp|grep 00000000:231D'

Note also that the netstat command does about the same thing: netstat -c --udp -an

My solution for my weenie program, will be to multi-thread.

Cheers!