TCP Receive window

fabrizi0 picture fabrizi0 · Apr 24, 2012 · Viewed 14.5k times · Source

I am trying to understand how the receiver window affect the throughput over a high latency connection.

I have a simple client-server pair of apps on two machines, far apart, with the connection between the two of 250mSec latency RTT. I ran this test with both Windows (XP, 7), and Linux (Ubuntu 10.x), with the same results, so for simplicity let's assume the case of: Client receiving data: WinXP Pro Server sending data: Win7 Pro Again, latency is 250mSec RTT.

I run my TCP test without changing the receiver buffer size on the client (default is 8Kb), and I see on the wire (using Wireshark):

  • the client send ACKS to the server and the TCP packets contains RWIN=65k
  • server send data and report RWIN=65k

Looking at the trace I see a bursts of 3-4 packets (with a payload of 1460 bytes), immediately followed by the ACK sent from the client machine to the server, then nothing for approx 250mSec then a new burst of packets from the server to the client.

So, in conclusion it appears that the server doesn't send new data even before it fills up the receiver's window.

To do more tests, I also ran the same test this time changing the receiver's buffer size on the client machine (on Windows, changing the receiver's buffer size ends up affecting the RWIN advertised by the machine). I would expect to see large burst of packets before blocking for ACK... and at least a higher throughput.

In this case I set recv buffer size to 100,000,000. The packets from the client to the server have now a RWIN=99,999,744 (well, that's nice), but unfortunately the pattern of the data sent FROM the server to the client is still the same: a short burst followed by a long wait. To confirm also what I see on the wire, I also measure the time to send a chunk of data from the server to the client. I don't see ANY changes in using a large RWIN or using the default.

Can anybody help me understanding why changing the RWIN doesn't really affect the throughput?

Few notes: - server send data as fast as possible using write() of chunks of 8Kb - as I said before, I see similar effects using Linux as well. changing the receiver buffer size affects the RWIN used by a node, but the throughput remains the same. - I analyze the trace after several hundred packets, to give enough time to the TCP slow start mechanism to enlarge the CWIN size.


As suggested, I'm adding a small snapshot of a wire trace here

No.     Time        Source                Destination           Protocol Length Info
     21 2.005080    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=11681 Win=99999744 Len=0
     22 2.005109    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=19305 Ack=1 Win=65536 Len=1460
     23 2.005116    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=20765 Ack=1 Win=65536 Len=1460
     24 2.005121    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=22225 Ack=1 Win=65536 Len=1460
     25 2.005128    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=23685 Ack=1 Win=65536 Len=892
     26 2.005154    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=14601 Win=99999744 Len=0
     27 2.007106    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=16385 Win=99999744 Len=0
     28 2.007398    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=24577 Ack=1 Win=65536 Len=1460
     29 2.007401    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=26037 Ack=1 Win=65536 Len=1460
     30 2.007403    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=27497 Ack=1 Win=65536 Len=1460
     31 2.007404    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=28957 Ack=1 Win=65536 Len=1460
     32 2.007406    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=30417 Ack=1 Win=65536 Len=1460
     33 2.007408    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=31877 Ack=1 Win=65536 Len=892
     34 2.007883    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=19305 Win=99999744 Len=0
     35 2.257143    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=22225 Win=99999744 Len=0
     36 2.257160    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=24577 Win=99999744 Len=0
     37 2.257358    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=32769 Ack=1 Win=65536 Len=1460
     38 2.257362    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=34229 Ack=1 Win=65536 Len=1460
     39 2.257364    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=35689 Ack=1 Win=65536 Len=1460
     40 2.257365    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=37149 Ack=1 Win=65536 Len=1460

As you see, the server stop sending data at packet #33.

Client send ACK at packet #34 of an old packet (seq=19305, sent on packet #20, not shown here). With an RWIN of 100Mb, I would expect the server NOT to block for a while.

After 20-30 packets, the congestion window on the server side should be large enough to send more packets than I see... I assume the congestion window eventually is going to grow up to the RWIN... but still, even after hundred of packets, the pattern is the same: data data then block for 250mSec...

Answer

Michael Slade picture Michael Slade · Apr 24, 2012

I can guess two things from the sample you have provided:

  1. The server has a send buffer of approx 15k.
  2. The dump you provide was done at the server end.

For the window of a TCP connection to scale to a certain size, both the send buffer on the sender and the receive buffer on the receiver must be big enough.

The actual window used is the minimum of the receive window offered/requested by the receiver and the sender's OS-set send buffer size.

Long story short, you need to configure the send buffer size on the server.

To clear things up, let's analyse your sample packet by packet.

The server sends another bunch of data:

 22 2.005109    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=19305 Ack=1 Win=65536 Len=1460
 23 2.005116    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=20765 Ack=1 Win=65536 Len=1460
 24 2.005121    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=22225 Ack=1 Win=65536 Len=1460
 25 2.005128    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=23685 Ack=1 Win=65536 Len=892

Notice the PSH. That's a flag indicating to any hops in between that a complete chunk of data has been sent and please send it to the other end. (A "complete" chunk being your 8kb in this case)

While the server is still sending, it gets 2 ACKS:

 26 2.005154    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=14601 Win=99999744 Len=0
 27 2.007106    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=16385 Win=99999744 Len=0

Note in particular the numbers: Ack=14601 and Ack=16385. Those numbers are the sequence numbers of the packets the receiver is acknowledging.

Ack=14601 means "I have received everything up to seq no 14601".

Note also these are older data, not in the sample you have given.

So the server processes those ACKs and continues sending data:

 28 2.007398    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=24577 Ack=1 Win=65536 Len=1460
 29 2.007401    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=26037 Ack=1 Win=65536 Len=1460
 30 2.007403    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=27497 Ack=1 Win=65536 Len=1460
 31 2.007404    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=28957 Ack=1 Win=65536 Len=1460
 32 2.007406    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=30417 Ack=1 Win=65536 Len=1460
 33 2.007408    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=31877 Ack=1 Win=65536 Len=892

Here we have a complete block of data: 1460*5+892 == 8192.

Then, 0.443 ms after sending that last packet, it gets one more ACK:

 34 2.007883    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=19305 Win=99999744 Len=0

And then there is a delay of almost exactly 250ms, during which the server sends nothing, before it receives these:

 35 2.257143    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=22225 Win=99999744 Len=0
 36 2.257160    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=24577 Win=99999744 Len=0

And then continues sending:

 37 2.257358    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=32769 Ack=1 Win=65536 Len=1460
 38 2.257362    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=34229 Ack=1 Win=65536 Len=1460
 39 2.257364    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=35689 Ack=1 Win=65536 Len=1460
 40 2.257365    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=37149 Ack=1 Win=65536 Len=1460

There are two very interesting things to notice here.
First, how many bytes were sent by the server without waiting for an ACK. Te last ACK seq no the server received before that delay is Ack=19305, and the seq no of the last packet sent by the server at that point is Seq=30417.

There so during that pause, there are 11112 bytes that the server has sent that have not yet been ACKed by the client.

Second, that was one ACK received by the server an instant after it sent a bunch of data, that didn't trigger it to send more. It's as if that ACK wasn't good enough.

The ACK received before that was Ack=16385, giving 30417-16385=14032 bytes that were sent by the server unacknowledged at that point. Only after receiving an ACK for seq no 24577, reducing that count to 30417-24577=5840, did the server start sending again.

So the fact that buffer size of 8k is large compared to the effective window size of 16k means throughput is actually reduced somewhat because the server will not send any of the 8k block until there is room for all of it.

Lastly, for those that are wondering, there is a TCP option called window scaling which allows one end of a connection to declare that the window size is actually some multiple of the number in the TCP header. see RFC 1323. The option is passed in the SYN packets so they aren't visible mid-connection - there is only a hint that window scaling is in effect because the window size TCP header is smaller than the window that is being used.