How WebSocket server handles multiple incoming connection requests?

smwikipedia picture smwikipedia · Feb 14, 2015 · Viewed 46.9k times · Source

According to here:

The HTTP Upgrade header requests that the server switch the application-layer protocol from HTTP to the WebSocket protocol.

The client handshake established a HTTP-on-TCP connection between IE10 and server. After the server returns its 101 response, the application-layer protocol switches from HTTP to WebSockets which uses the previously established TCP connection.

HTTP is completely out of the picture at this point. Using the lightweight WebSocket wire protocol, messages can now be sent or received by either endpoint at any time.

So, my understanding is, after the 1st client finished handshake with the server, the server's 80 port will be monopolized by the WebSocket protocol. And the HTTP is no longer working on 80 port.

So how could the 2nd client exchange handshake with the server. After all the WebSocket handshake is in HTTP format.

ADD 1

Thanks for all the answers so far. They are really helpful.

Now I understand that the same server's 80 port is shared by multiple TCP connections. And this sharing is totally OK because TCP connections are identified by a 5-element tuple as Jan-Philip Gehrcke pointed out.

I'd like to add a few thoughts.

Both WebSocket and HTTP are merely application level protocols. Usually they both rely on the TCP protocol as their transport.

Why choose port 80?

WebSocket design intentionally choose server's port 80 for both handshake and following communication. I think the designer wants to make WebSocket communication look like normal HTTP communication from the transport level's perspective (i.e. the server port number is still 80). But according to jfriend00's answer, this trick doesn't always fool the network infrastructures.

How does the protocol shift from HTTP to WebSocket happen?

From RFC 6455 - WebSocket protocol

Basically it is intended to be as close to just exposing raw TCP to script as possible given the constraints of the Web. It’s also designed in such a way that its servers can share a port with HTTP servers, by having its handshake be a valid HTTP Upgrade request. One could conceptually use other protocols to establish client-server messaging, but the intent of WebSockets is to provide a relatively simple protocol that can coexist with HTTP and deployed HTTP infrastructure (such as proxies) and that is as close to TCP as is safe for use with such infrastructure given security considerations, with targeted additions to simplify usage and keep simple things simple (such as the addition of message semantics).

So I think I am wrong on the following statement:

The handshake request mimic HTTP request but the communication that follows don't. The handshake request arrives at the server on port 80. Because it's 80 port, server will treat it with HTTP protocol. And that's why the WebSocket handshake request must be in HTTP format. If so, I think the HTTP protocol MUST be modified/extended to recognize those WebSocket-specific things. Otherwise it won't realize it should yield to WebSocket protocol.

I think it should be understood like this:

WebSocket communication starts with a valid HTTP request from client to server. So it is the server that follows the HTTP protocol to parse the handshake request and identify the begging for protocol change. And it is the server that switches the protocol. So HTTP protocol doesn't need to change. HTTP protocol doesn't even need to know about WebSocket.

WebSocket and Comet

So WebSocket is different from Comet technologies in that WebSoket doesn't limit itself within the current HTTP realm to solve the bi-directional communication issue.

ADD 2

A related question: How does a browser establish connection with a web server on 80 port? Details?

Answer

Dr. Jan-Philip Gehrcke picture Dr. Jan-Philip Gehrcke · Feb 14, 2015

Your question is great!

I would like to try to answer it from the point of view involving the system calls listen() and accept(). Understanding the behavior of these two calls I think is quite insightful and sufficient to answer your question.

Spoiler: we can answer your question by looking into how TCP/IP works :-)

For the core part of the question there really is no difference depending on HTTP or WebSocket. The common ground is TCP over IP. Sending an HTTP request requires an established TCP/IP connection between two parties (I have tried to elaborate on that a bit more here).

In case of a simple web browser / web server scenario

  1. first, a TCP connection is established between both (initiated by the client)
  2. then an HTTP request is sent through that TCP connection (from the client to the server)
  3. then an HTTP response is sent through the same TCP connection (in the other direction, from the server to the client)

After this exchange, the underlying TCP connection is not needed anymore and usually becomes destroyed/disconnected. In case of a so-called "HTTP Upgrade request" (which can be thought of as: "hey, server! Please upgrade this to a WebSocket connection!"), the underlying TCP connection just goes on living, and the WebSocket communication goes through the very same TCP connection that was created initially (step (1) above).

This hopefully clarifies that the key difference between WebSocket and HTTP is a switch in a high-level protocol (from HTTP toward WebSocket) without changing the underlying transport channel (a TCP/IP connection).

Handling multiple IP connection attempts through the same socket, how?

This is a topic I was once struggling with myself and that many do not understand because it is a little non-intuitive. However, the concept actually is quite simple when one understands how the basic socket-related system calls provided by the operating system are working.

First, one needs to appreciate that an IP connection is uniquely defined by five pieces of information:

IP:PORT of Machine A and IP:PORT of Machine B and the protocol (TCP or UDP)

Now, socket objects are often thought to represent a connection. But that is not entirely true. They may represent different things: they can be active or passive. A socket object in passive/listen mode does something pretty special, and that is important to answer your question.

http://linux.die.net/man/2/listen says:

listen() marks the socket referred to by sockfd as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept(2).

That is, we can create a passive socket that listens for incoming connection requests. By definition, such a socket can never represent a connection. It just listens for connection requests.

Let's head over to accept() (http://linux.die.net/man/2/accept):

The accept() system call is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socket, sockfd, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket sockfd is unaffected by this call.

Let's digest this carefully, I think this now really answers your question.

accept() does not change the state of the passive socket created before. It returns an active (connected) socket (such a socket then represents the five pieces of information states above -- simple, right?).

Usually, this newly created active socket object is then handed off to another process or thread or just "entity" that takes care of the connection. After accept() has returned this connected socket object, accept() can be called again on the passive socket, and again and again -- something that is known as accept loop.

But calling accept() takes time, right? Can't it miss incoming connection requests? There is more essential information in the just-quoted help text: there is a queue of pending connection requests! It is handled automatically by the TCP/IP stack of your operating system.

That means that while accept() can only deal with incoming connection requests one-by-one, no incoming request will be missed even when they are incoming at a high rate or (quasi-)simultaneously. One could say that the behavior of accept() is rate-limiting the frequency of incoming connection requests your machine can handle. However, this is a fast system call and in practice, other limitations hit in first -- usually those related to handling all the connections that have been accepted so far.