How to understand the Kademlia(KAD) protocol

rock_cloud picture rock_cloud · Feb 16, 2012 · Viewed 9.1k times · Source

Recently, I've read a document of the Kademlia Protocol, I tried to understand the protocol, but I still have some question: Why a node must find another node when he knows its ID but ip or port? Why he has the ID while he doesn't know the ip or port, where did he get the ID? I think the "distance" between two different nodes is not a routing distance or real distance, it's only a virtual distance that can be used the algorithm to find the node quickly, it's that right?

Maybe my English is not very clear because English is not my mother tongue, but I'll try to express myself clear if you need. Thanks very much!

Answer

Fraser picture Fraser · Feb 16, 2012

As cHao said, the distributed nature of the network means that nodes need to publish their IDs and their contact details to other nodes they talk to. There is no central place where IDs are mapped to contact info, so each node must keep this mapping for a subset of the nodes on the network in its own routing table.

Kademlia routing tables are structured so that nodes will have detailed knowledge of the network close to them, and exponentially decreasing knowledge further away.

The use of bitwise XOR as a measure of notional distance between IDs has the advantage that for a given target ID, no two IDs can have the same distance to the target.

Imagine a simple example where the IDs are in the range 00 to 63. If Kademlia used e.g. pure mathematical difference as a measure of distance, 15 and 35 would be the same distance to 25 - both would have a distance of 10. Using XOR, the distance between 15 and 25 is 22, and between 25 and 35 it's 58.

In this way, the group of k closest IDs to a target ID can be calculated unambiguously.

The constant k has a couple of uses in Kademlia, but it's primarily the replication factor. In other words, a piece of data is stored on the k closest nodes to the data's ID.

The lookup process is designed to return either a group of k nodes (before storing data on each of them) or return a single piece of data (from the first node holding it during the lookup iterations).

Because of this, pure Kademlia isn't best suited to finding just a single node, so I'm not sure that part of your question is too relevant. If you did want to use Kademlia to find a single node, it would probably be worth modifying the lookup process to finish early as soon as any node returns the target node's contact details (in the same way that the lookup finishes early if a target value is found during the process).