pcap_dispatch - callback processing questions

user270398 picture user270398 · Oct 14, 2011 · Viewed 8.4k times · Source

I am writing fairly simply pcap "live" capture engine, however the packet processing callback implementation for pcap_dispatch should take relatively long time for processing. Does pcap run every "pcap_handler" callback in separate thread? If yes, is "pcap_handler" thread-safe, or should the care be taken to protect it with critical sections? Alternatively, does pcap_dispatch callback works in serial fashion? E.g. is "pcap_handler" for the packet 2 called only after "pcap_handler" for packet 1 is done? If so, is there an approach to avoid accumulating latency? Thanks, -V

Answer

Raphael R. picture Raphael R. · Oct 14, 2011

Pcap basically works like this: There is a kernel-mode driver capturing the packets and placing them in a buffer of size B. The user-mode application may request any amount of packets at any time using pcap_loop, pcap_dispatch, or pcap_next (the latter is basically pcap_dispatch with one packet).

Therefore, when you use pcap_dispatch to request some packets, libpcap goes to the kernel and asks for the next packet in the buffer (If there isn't one the timeout code and stuff kicks in, but this is irrelevant for this discussion), transfers it into userland and deletes it from the buffer. After that, pcap_dispatch calls your handler, reduces it's packets-to-do counter and starts from the beginning. As a result, pcap_dispatch only returns if the requested amount of packets have been processed, an error ocurred, or a timeout happened.

As you can see, libpcap is completely non-threaded, as most C API's are. The kernel-mode driver, however, is obviously happy enough to deliver packets to multiple threads (else you wouldn't be able to capture from more than one process), and is completly thread-safe (there is one separate buffer for each usermode handle).

This implies that you must implement all parallelisms by yourself. You'd want to do something like this:

pcap_dispatch(P, count, handler, data);
.
.
.
struct pcap_work_item {
    struct pcap_pkthdr header;
    u_char data[];
};

void handler(u_char *user, struct pcap_pkthdr *header, u_char *data)
{
    struct pcap_work_item *item = malloc(sizeof(pcap_pkthdr) + header->caplen);
    item->header = *header;
    memcpy(item->data, data, header->caplen);
    queue_work_item(item);
}

Note that we have to copy the packet into the heap, because the header and data pointers are invalid after the callback returns.

The function queue_work_item should find a worker thread, and assign it the task of handling the packet. Since you said that your callback takes a 'relativley long time', you likely need a large number of worker threads. Finding a suitable number of workers is subject to fine-tweaking.

At the beginning of this post I said that the kernel-mode driver has buffer to collect incoming packets which await processing. The size of this buffer is implementation-defined. The snaplen parameter to pcap_open_live only controls how many bytes of one packet are captured, however, the number of packets cannot be controlled in a portable fashion. It might be fixed-size. It might get larger as more and more packets arrive. However, if it overflows, all further packets are discarded until there is enough space for the next one to arrive. If you want to use your application in a high-traffic environment, you want to make sure that your *pcap_dispatch* callback completes quickly. My sample callback simply assigns the packet to a worker, so it works fine even in high-traffic enviroments.

I hope this answers all your questions.