what does ntohs() in pcap exactly do?

tabs_over_spaces picture tabs_over_spaces · Apr 8, 2013 · Viewed 11.2k times · Source

I read the documentation from one of the answers:

The ntohs function takes a 16-bit number in TCP/IP network byte order (the AF_INET or AF_INET6 address family) and returns a 16-bit number in host byte order.

Please explain with an example, as in what is network byte order and what is the host byte order.

Answer

user862787 picture user862787 · Apr 8, 2013

The number 1000 is, in binary, 1111101000.

If that's in a 16-bit binary number, that's 0000001111101000.

If that's split into two 8-bit bytes, that's two bytes with the values 00000011 and 11101000.

Those two bytes can be in two different orders:

  • In "big-endian" byte order, the byte containing the upper 8 bits is first, and the byte with the lower 8 bits is second, so the first byte is 00000011 and the second byte is 11101000.
  • In "little-endian" byte order, the byte containing the lower 8 bits is first, and the byte containing the upper 8 bits is second, so the first byte is 11101000 and the second byte is 00000011.

In a byte-addressible machine, the hardware can be "big-endian" or "little-endian", depending on which byte is stored at a lower address in memory. Most personal computers are little-endian; larger computers come in both big-endian and little-endian flavors, with a number of larger computers (IBM mainframes and midrange computers and SPARC servers, for example) being big-endian.

Most networks are bit-serial, so the bits are transmitted one after the other. The bits of a byte might be transmitted with the most-significant or least-significant bit first, but the network hardware will hide those details from the processor. However, they will transmit bytes in the order they are in the memory of the host, so, if a little-endian machine is transmitting data to a big-endian machine, the number that the little-endian machine transmits would look different on the receiving big-endian machine; those differences are not hidden by the network hardware.

Therefore, in order to allow big-endian and little-endian machines to communicate, at each protocol layer, either:

  • a "standard" byte order needs to be chosen, and machines using a different byte order need to move the bytes of multi-byte numbers around, so that they're not in the machine's standard byte order, before transmitting data, move them around, so that they are in the machine's standard byte order, after receiving data;
  • the two machines need to negotiate a particular byte order for the session (for example, for the X11 network windowing protocol, the initial message from the client to the server specifies the byte order to use);
  • the protocol messages need to specify the byte order being used (as is done with DCE RPC, for example; that's the protocol used for "Microsoft RPC");
  • the receiving machine needs to somehow correctly guess the byte order (I don't know of any currently-used protocols where that's done, but the old BSD "talk" protocol didn't use any of the techniques used above, and the implementation on the Sun386i had to use it to handle both big-endian Motorola 68K machines and little-endian Intel x86 machines).

Various Internet protocols use the first strategy, specifying big-endian as the byte order; it's referred to as "network byte order" in various RFCs. (Microsoft's SMB file access protocol also uses the first strategy, but specifies little-endian.)

So "network byte order" is big-endian. "Host byte order" is the byte order of the machine you're using; it could be big-endian, in which case ntohs() just returns the value you gave it, or it could be little-endian, in which case ntohs() swaps the two bytes of the value you gave it and returns that value. For example, on a big-endian machine, ntohs(1000) would return 1000, and, on a little-endian machine, ntohs(1000) would swap the high-order and low-order bytes, giving 1110100000000011 in binary or 59395 in decimal.