What is the most efficient way to match the IP addresses to huge route entries?

比尔盖子 picture 比尔盖子 · Jun 16, 2014 · Viewed 8.9k times · Source

Imagining there is a firewall, and the system administrator blocked many subnets, perhaps all subnets of a specific country.

For example:

192.168.2.0 / 255.255.255.0
223.201.0.0 / 255.255.0.0
223.202.0.0 / 255.254.0.0
223.208.0.0 / 255.252.0.0
....

To determine whether a IP address have been blocked, the firewall may use the algorithm below.

func blocked(ip)
    foreach subnet in blocked_subnets
        if in_subnet(subnet, ip)
            return true
    return false

But, the algorithm needs too much time to run, the time complexity is O(n). If the route table contains too many entries, the network will become almost unavailable.

Is there a more efficient way to match the IP addresses to huge route entries? It is based on some kinds of trees/graphs (Trie?) I guess. I have read something about Longest prefix match and Trie but didn't get the point.

Answer

Jim Mischel picture Jim Mischel · Jun 17, 2014

All you really need is a trie with four levels. Each non-leaf node contains an array of up to 256 child nodes. Each node also contains a subnet mask. So, given your example:

192.168.2.0 / 255.255.255.0
223.201.0.0 / 255.255.0.0
223.202.0.0 / 255.254.0.0
223.208.0.0 / 255.252.0.0

Your tree would look something like that below. The two numbers for each node are the IP segment followed by the subnet mask.

             root
         /           \
     192,255             223,255
       |           -------------------------
     168,255       |           |           |
       |          201,255    202,255    208,255
      2,255

When you get an IP address, you break it into segments. You search for the first segment at the root level. For speed, you'll probably want to use an array at the root level so that you can do a direct lookup.

Say the first segment of the IP address is 223. You'd grab the node from root[223], and now you're working with just that one subtree. You probably don't want a full array at the other levels, unless your data is really dense. A dictionary of some kind for the subsequent levels is probably what you'll want. If the next segment is 201, you look up 201 in the dictionary for the 223 node, and now your possible list of candidates is just 64K items (i.e. all IP addresses that are 223,201.x.x). You can do the same thing with the other two levels. The result is that you can resolve an IP address in just four lookups: one lookup in an array, and three dictionary lookups.

This structure is also very easy to maintain. Inserting a new address or range requires at most four lookups and adds. Same with deleting. Updates can be done in-place, without having to rebuild the entire tree. You just have to make sure that you're not trying to read while you're updating, and you're not trying to do concurrent updates. But any number of readers can be accessing the thing concurrently.