This one's been bugging me for years.
Basic question: Is there some reason ARP has to be implemented with fixed timeouts on ARP cache entries?
I do a lot of work in Real Time ciricles. We do most of our inter-system communications these days on dedicated UDP/IP links. This for the most part works reliably in Real Time, but for one nit: ARP entry timeouts.
The way typical implementations do ARP is the following:
The obvious solution (which we use religously) is to make all the ARP entries static. However, that's a royal PITA (particularly on RTOS's where finding an interface's MAC address is not always a matter of a couple of easy GUI clicks).
Back when we wrote our own IP stack, I solved this problem by never (ever) timing out ARP table entries. That has obvious drawbacks. A more robust and perfectly reasonable solution might be to refresh the entry timeout whenever a packet from the same MAC/IP combo is seen. That way an entry would only get timed-out if it hadn't communicated with the stack in that amount of time.
But now we're using our vendor's IP stack, and we're back to the stupid ARP timeouts. We have enough leverage with this vendor that I could perhaps get them to use a less inconvienient scheme. However, the universality of this brain-dead timeout algorithm leads me to believe it might be a required part of the implementation.
So that's the question. Is this behavior somehow required?
RFC1122 Requirements for Internet Hosts discusses this.
2.3.2.1 ARP Cache Validation
An implementation of the Address Resolution Protocol (ARP)
[LINK:2] MUST provide a mechanism to flush out-of-date cache
entries. If this mechanism involves a timeout, it SHOULD be
possible to configure the timeout value.
...
DISCUSSION:
The ARP specification [LINK:2] suggests but does not
require a timeout mechanism to invalidate cache entries
when hosts change their Ethernet addresses. The
prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
has significantly increased the likelihood that cache
entries in hosts will become invalid, and therefore
some ARP-cache invalidation mechanism is now required
for hosts. Even in the absence of proxy ARP, a long-
period cache timeout is useful in order to
automatically correct any bad ARP data that might have
been cached.
Networks can be very dynamic; DHCP servers can assign the same IP address to different computers when old lease times expire (making current ARP data invalid), there can be IP conflicts that will never be noticed unless ARP requests are periodically made, etc.
It also provides a mechanism for checking if a host is still on the network. Imagine you're streaming a video over UDP to some IP address 192.168.0.5. If you cache the MAC address of that machine forever, you'll just keep spamming out UDP packets even if the host goes down. Doing an ARP request every now and then will stop the stream with a destination unreachable error because no one responded with a MAC for that IP.