HTTP packet reconstruction

mike picture mike · Oct 7, 2009 · Viewed 13.3k times · Source

If I have a large HTTP packet which has been split up into a number of TCP packets, how can I reconstruct them back into a single HTTP packet? Basically, where in the packet do I look to tell when a HTTP packet is starting/ending? I can't seem to see any flags/fields in the TCP header that denote the start or end of the HTTP packet.

EDIT: In follow up to the responses. If TCP manages the stream, how does it know when the stream starts and ends? Is that determined by the socket opening and closing? Some protocol, at some level, must be able to know when the HTTP stream/packet has started and ended. That is what I would like to know.

The situation I am in is I am using a packet sniffer in C# which reads in TCP packets, and I would like to be able to reconstruct the HTTP requests/responses/etc. going through the interface like how wireshark and various other sniffers manage to. Alternatively are there any C# libraries that let you tap into the HTTP streams at the higher level, saving me having to reconstruct the HTTP stream/packets myself?

Thanks.

Answer

mike picture mike · Oct 8, 2009

OK I worked out how to do this (dodgy but it gets the job done).

It is simple to strip away the Ethernet, IP, and TCP headers leaving you with the 'raw' data message. Looking inside the message, it is easy to detect whether it is the start of a HTTP packet by looking for the "HTTP/1.1 ..." at the start of the packet. This indicates the packet is the start of a HTTP stream/larger packet/whatever. You can also do some simple parsing to read the "Content-Length" field which is the total length of the entire HTTP packet.

You can also use the Source/Destination IP & Port numbers to form a unique ID for the link. So after receiving the header packet, take note of these 4 things (SRCIP, SRCPORT, DESTIP, DESTPORT). Next time you receive a packet matching this port/ip combo, you can check whether it's the next part of the HTTP packet. You can use the sequence numbers to do some validation and probably other stuff, but generally the packets are in order so it's OK. I think a new port is opened for each HTTP stream so you shouldn't receive random packets that aren't part of the stream, but this could be an area prone for error.

Anyway, once you received this packet, once again strip away the headers and get the raw message. Add it onto the already known part of the message. If the length of the total message received so far is equal to the length read from "Content-Length" field, the packet is complete!

This method is obviously prone to a huge amount of errors, but I am not after an extremely robust way of doing it. I thought I would answer my own question in case someone else comes across this same issue in the future! Good luck with your sniffing :D