ARP Timeouts. Why fixed periodic?

T.E.D. picture T.E.D. · Mar 15, 2013 · Viewed 8.8k times · Source

This one's been bugging me for years.

Basic question: Is there some reason ARP has to be implemented with fixed timeouts on ARP cache entries?

I do a lot of work in Real Time ciricles. We do most of our inter-system communications these days on dedicated UDP/IP links. This for the most part works reliably in Real Time, but for one nit: ARP entry timeouts.

The way typical implementations do ARP is the following:

  • When client asks to send an IP packet to an IP address with an unkown MAC address, instead of sending that IP packet, the stack sends out an ARP request. If an upper layer (TCP) does resends, that's no problem. But since we use UDP, the original message is lost. At startup time this is OK, but in the middle of operation this is a Bad Thing™.
  • (Dynamic) ARP table entries are removed from the ARP table periodicly, even if we just got a packet from that system a millisecond ago. This means the Bad Thing™ happens to our system regularly.

The obvious solution (which we use religously) is to make all the ARP entries static. However, that's a royal PITA (particularly on RTOS's where finding an interface's MAC address is not always a matter of a couple of easy GUI clicks).

Back when we wrote our own IP stack, I solved this problem by never (ever) timing out ARP table entries. That has obvious drawbacks. A more robust and perfectly reasonable solution might be to refresh the entry timeout whenever a packet from the same MAC/IP combo is seen. That way an entry would only get timed-out if it hadn't communicated with the stack in that amount of time.

But now we're using our vendor's IP stack, and we're back to the stupid ARP timeouts. We have enough leverage with this vendor that I could perhaps get them to use a less inconvienient scheme. However, the universality of this brain-dead timeout algorithm leads me to believe it might be a required part of the implementation.

So that's the question. Is this behavior somehow required?

Answer

PherricOxide picture PherricOxide · Mar 19, 2013

RFC1122 Requirements for Internet Hosts discusses this.

     2.3.2.1  ARP Cache Validation

        An implementation of the Address Resolution Protocol (ARP)
        [LINK:2] MUST provide a mechanism to flush out-of-date cache
        entries.  If this mechanism involves a timeout, it SHOULD be
        possible to configure the timeout value.

      ...

       DISCUSSION:
             The ARP specification [LINK:2] suggests but does not
             require a timeout mechanism to invalidate cache entries
             when hosts change their Ethernet addresses.  The
             prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
             has significantly increased the likelihood that cache
             entries in hosts will become invalid, and therefore
             some ARP-cache invalidation mechanism is now required
             for hosts.  Even in the absence of proxy ARP, a long-
             period cache timeout is useful in order to
             automatically correct any bad ARP data that might have
             been cached.

Networks can be very dynamic; DHCP servers can assign the same IP address to different computers when old lease times expire (making current ARP data invalid), there can be IP conflicts that will never be noticed unless ARP requests are periodically made, etc.

It also provides a mechanism for checking if a host is still on the network. Imagine you're streaming a video over UDP to some IP address 192.168.0.5. If you cache the MAC address of that machine forever, you'll just keep spamming out UDP packets even if the host goes down. Doing an ARP request every now and then will stop the stream with a destination unreachable error because no one responded with a MAC for that IP.