binary protocols v. text protocols

der_grosse picture der_grosse · Apr 15, 2010 · Viewed 39.9k times · Source

does anyone have a good definition for what a binary protocol is? and what is a text protocol actually? how do these compare to each other in terms of bits sent on the wire?

here's what wikipedia says about binary protocols:

A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being (http://en.wikipedia.org/wiki/Binary_protocol)

oh come on!

to be more clear, if I have jpg file how would that be sent through a binary protocol and how through a text one? in terms of bits/bytes sent on the wire of course.

at the end of the day if you look at a string it is itself an array of bytes so the distinction between the 2 protocols should rest on what actual data is being sent on the wire. in other words, on how the initial data (jpg file) is encoded before being sent.

Answer

Tyler McHenry picture Tyler McHenry · Apr 15, 2010

Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.

But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:

Request:

GET /files/image.jpg HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.01 [en] (Win95; I)
Host: hal.etc.com.au
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

Response:

HTTP/1.1 200 OK
Date: Mon, 19 Jan 1998 03:52:51 GMT
Server: Apache/1.2.4
Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT
ETag: "61a85-17c3-343b08dc"
Content-Length: 60830
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg

<binary data goes here>

Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like

Request:

struct request {
  int requestType;
  int protocolVersion;
  char path[1024];
  char user_agent[1024];
  char host[1024];
  long int accept_bitmask;
  long int language_bitmask;
  long int charset_bitmask;
};

Response:

struct response {
  int responseType;
  int protocolVersion;
  time_t date;
  char host[1024];
  time_t modification_date;
  char etag[1024];
  size_t content_length;
  int keepalive_timeout;
  int keepalive_max;
  int connection_type;
  char content_type[1024];
  char data[];
};

Where the field names would not have to be transmitted at all, and where, for example, the responseType in the response structure is an int with the value 200 instead of three characters '2' '0' '0'. That's what a text based protocol is: one that is designed to be communicated as a flat stream of (usually human-readable) lines of text, rather than as structured data of many different types.