since I'm using WebSocket connections on more regular bases, I was interested in how things work under the hood. So I digged into the endless spec documents for a while, but so far I couldn't really find anything about chunking the transmission stream itself.
The WebSocket protocol calls it data frames (which describes the pure data stream, so its also called non-control frames). As far as I understood the spec, there is no defined max-length and no defined MTU (maximum transfer unit) value, that in turn means a single WebSocket data-frame may contain, by spec(!), an infinite amount of data (please correct me if I'm wrong here, I'm still a student on this).
After reading that, I instantly setup my little Node WebSocket server. Since I have a strong Ajax history (also on streaming and Comet), my expectations originaly were like, "there must be some kind of interactive mode for reading data while it is transfered". But I am wrong there, ain't I ?
I started out small, with 4kb of data.
server
testSocket.emit( 'data', new Array( 4096 ).join( 'X' ) );
and like expected this arrives on the client as one data-chunk
client
wsInstance.onmessage = function( data ) {
console.log( data.length ); // 4095
};
so I increased the payload and I actually was expecting again, that at some point, the client-side onmessage
handler will fire repeatly, effectivley chunking the transmission. But to my shock, it never happened (node-server, tested on firefox, chrome and safari client-side). My biggest payload was 80 MB
testSocket.emit( 'data', new Array( 1024*1024*80 ).join( 'X' ) );
and it still arrived in one big data-chunk on the client. Of course, this takes a while even if you have a pretty good connection. Questions here are
I might still look from the wrong perspective on WebSockets, probably the need for sending large data-amounts is just not there and you should chunk/split any data logically yourself before sending ?
First, you need to differentiate between the WebSocket protocol and the WebSocket API within browsers.
The WebSocket protocol has a frame-size limit of 2^63 octets, but a WebSocket message can be composed of an unlimited number of frames.
The WebSocket API within browsers does not expose a frame-based or streaming API, but only a message-based API. The payload of an incoming message is always completely buffered up (within the browser's WebSocket implementation) before providing it to JavaScript.
APIs of other WebSocket implementations may provide frame- or streaming-based access to payload transferred via the WebSocket protocol. For example, AutobahnPython does. You can read more in the examples here https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/streaming.
Disclosure: I am original author of Autobahn and work for Tavendo.
More considerations:
As long as there is no frame/streaming API in browser JS WebSocket API, you can only receive/send complete WS messages.
A single (plain) WebSocket connection cannot interleave the payload of multiple messages. So i.e. if you use large messages, those are delivered in order, and you won't be able to send small messages in between while a big message is still on the fly.
There is an upcoming WebSocket extension (extensions are a builtin mechanism to extend the protocol): WebSocket multiplexing. This allows to have multiple (logical) WebSocket connections over a single underlying TCP connection, which has multiple advantages.
Note also: you can open multiple WS connections (over different underlying TCPs) to a single target server from a single JS / HTML page today.
Note also: you can do "chunking" yourself in application layer: send your stuff in smaller WS messages a reassemble yourself.
I agree, in an ideal world, you'd have message/frame/streaming API in browser plus WebSocket multiplexing. That would give all the power and convenience.