What's the most efficient protocol for reliable multicast?

protocols file-transfer multicast ethernet reliable-multicast

JayMcClellan · Apr 19, 2009 · Viewed 11.1k times · Source

When a sender needs to multicast a relatively large volume of data (say several megabytes per second) in a reliable way over Ethernet to a modest number of receivers (say less than a dozen) on the same subnet, what is the most efficient protocol? By reliable I mean that if a packet is lost, the protocol ensures that it gets resent such that there is no data loss in any receiver. The term efficient is a lot harder to define, but let's say we want to maximize throughput and minimize network bandwidth with modest CPU usage on both ends. That's still not a clear-cut definition but it's the best I can come up with. Either a stream-oriented or a message-oriented protocol would be acceptable.

I'd appreciate real-world examples and I'll gladly accept subjective answers, i.e. what's your favorite multicast protocol, if you can explain its pros and cons.

Answer

Real-world example: TIBCO Rendezvous.

Data is sent out via multicast with a sequence number. A client that detects a missing sequence number sends out a messge on the multicast group "hey, I missed packet 12345". The server re-multicasts out that data. The server has a configurable amount of data to buffer in case a client requests it.

The problem:

Imagine having a single client that drops half of his packets, and 100 healthy clients. This client sends a retransmission request for every other packet. The server begins to cause enough load on one of the healthy clients such that it starts dropping packets and requesting retransmissions. The extra load from that causes another healthy client to begin requesting retransmissions. And so on. A congestion collapse results.

Tibco provides a workaround, of cutting off a subscriber that sends too many retransmission requests. This makes it harder for a single subscriber to cause a congestion collapse.

The other workaround to limit the risk of congestion collapse is to limit the amount of data that a server is willing to retransmit.

Tibco should also provide heuristics in the client and server as to whether to multicast or unicast the retransmission request, and the retransmission itself. They don't. (For the server, you could unicast the retransmission if only one client requested it in a certain time window, for the client you could unicast the retransmission request if the server has told you - in the retransmitted packet - that you are the only one requesting retransmissions and to please unicast the requests in the future)

Fundamentally you will have to decide between how strongly you want to guarantee that clients receive data vs the risk of congestion collapse. You will have to make guesses as to where a packet was dropped and whether the retransmission is most efficiently sent unicast or multicast. If the server understands the data and can decide to not send a retransmission if there is updated data to be sent anyway (that makes the retransmission irrelevant), you are in a much better position than a framework such as Tibco RV.

Sometimes understanding the data can lead to wrong assumptions. For example, market data - it may seem at first OK to not retransmit a quote when there is an updated quote. But later, you may find that a subscriber was keeping a quote history, not just trying to keep track of the current quote. Perhaps you may have different requirements depending on the subscriber, and some clients will prefer unicast TCP vs multicast.

At some point you will need to make arbitrary decisions on the server of how much data to buffer in case of retransmissions or slow clients.

What's the most efficient protocol for reliable multicast?

Answer

Related questions