I am running iperf
measurements between two servers, connected through 10Gbit link. I am trying to correlate the maximum window size that I observe with the system configuration parameters.
In particular, I have observed that the maximum window size is 3 MiB. However, I cannot find the corresponding values in the system files.
By running sysctl -a
I get the following values:
net.ipv4.tcp_rmem = 4096 87380 6291456
net.core.rmem_max = 212992
The first value tells us that the maximum receiver window size is 6 MiB. However, TCP tends to allocate twice the requested size, so the maximum receiver window size should be 3 MiB, exactly as I have measured it. From man tcp
:
Note that TCP actually allocates twice the size of the buffer requested in the setsockopt(2) call, and so a succeeding getsockopt(2) call will not return the same size of buffer as requested in the setsockopt(2) call. TCP uses the extra space for administrative purposes and internal kernel structures, and the /proc file values reflect the larger sizes compared to the actual TCP windows.
However, the second value, net.core.rmem_max
, states that the maximum receiver window size cannot be more than 208 KiB. And this is supposed to be the hard limit, according to man tcp
:
tcp_rmem max: the maximum size of the receive buffer used by each TCP socket. This value does not override the global
net.core.rmem_max
. This is not used to limit the size of the receive buffer declared using SO_RCVBUF on a socket.
So, how come and I observe a maximum window size larger than the one specified in net.core.rmem_max
?
NB: I have also calculated the Bandwidth-Latency product: window_size = Bandwidth x RTT
which is about 3 MiB (10 Gbps @ 2 msec RTT), thus verifying my traffic capture.
A quick search turned up:
in void tcp_select_initial_window()
if (wscale_ok) {
/* Set window scaling on max possible window
* See RFC1323 for an explanation of the limit to 14
*/
space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
(*rcv_wscale)++;
}
}
max_t
takes the higher value of the arguments. So the bigger value takes precedence here.
One other reference to sysctl_rmem_max
is made where it is used to limit the argument to SO_RCVBUF
(in net/core/sock.c).
All other tcp code refers to sysctl_tcp_rmem
only.
So without looking deeper into the code you can conclude that a bigger net.ipv4.tcp_rmem
will override net.core.rmem_max
in all cases except when setting SO_RCVBUF
(whose check can be bypassed using SO_RCVBUFFORCE
)