For the first time I used a magnet link. Curious about how it works, I looked up the specs and didn't find any answers. The wiki says xt
means "exact topic" and is followed by the format (btih
in this case) with a SHA1 hash. I saw base32 mentioned, knowing it's 5 bits per character and 32 characters, I found it holds exactly 160bits, which is exactly the size of the SHA1.
There's no room for an IP address or anything, it's just a SHA1. So how does the BitTorrent client find the actual file? I turned on URL Snooper to see if it visits a page (using TCP) or does a lookup or the like, but nothing happened. I have no idea how the client finds peers. How does this work?
Also, what is the hash of? Is it a hash of an array of all the file hashes together? Maybe it's a hash of the actual torrent file required (stripping certain information)?
In a VM, I tried a magnet link with uTorrent (which was freshly installed) and it managed to find peers. Where did the first peer come from? It was fresh and there were no other torrents.
A BitTorrent magnet link identifies a torrent using1 a SHA-1 or truncated SHA-256 hash value known as the "infohash". This is the same value that peers (clients) use to identify torrents when communicating with trackers or other peers. A traditional .torrent file contains a data structure with two top-level keys: announce
, identifying the tracker(s) to use for the download, and info
, containing the filenames and hashes for the torrent. The "infohash" is the hash of the encoded info
data.
Some magnet links include trackers or web seeds, but they often don't. Your client may know nothing about the torrent except for its infohash. The first thing it needs to is find other peers who are downloading the torrent. It does this using a separate peer-to-peer network2 operating a "distributed hash table" (DHT). A DHT is a big distributed index which maps torrents (identified by infohashes) to lists of peers (identified by IP address and ports) who are participating in a swarm for that torrent (uploading/downloading data or metadata).
The first time a client joins the DHT network it generates a random 160-bit ID from the same space as infohashes. It then bootstraps its connection to the DHT network using either hard-coded addresses of clients controlled by the client developer, or DHT-supporting clients previously encountered in a torrent swarm. When it wants to participate in a swarm for a given torrent, it searches the DHT network for several other clients whose IDs are as close3 as possible to the infohash. It notifies these clients that it would like to participate in the swarm, and asks them for the connection information of any peers they already know of who are participating in the swarm.
When peers are uploading/downloading a particular torrent, they try to tell each other about all of the other peers they know of that are participating in the same torrent swarm. This lets peers know of each other quickly, without subjecting a tracker or DHT to constant requests. Once you've learned of a few peers from the DHT, your client will be able to ask those peers for the connection information of yet more peers in the torrent swarm, until you have all of the peers you need.
Finally, we can ask these peers for the torrent's info
metadata, containing the filenames and hash list. Once we've downloaded this information and verified that it's correct using the known infohash
, we're in practically the same position as a client that started with a regular .torrent
file and got a list of peers from the included tracker.
The download may begin.
1 The infohash is typically hex-encoded, but some old clients used base 32 instead. v1 (urn:btih:
) uses the SHA-1 digest directly, while v2 (urn:bimh:
) adds a multihash prefix to identify the hash algorithm and digest length.
2 There are two primary DHT networks: the simpler "mainline" DHT, and a more complicated protocol used by Azureus.
3 The distance is measured by XOR.