Most of the specification work of BEP 52 was done by the8472. The libtorrent support for bittorrent v2 was mostly implemented by Steven Siloti. BiglyBT also has an implementation of BitTorrent v2 to be released in the near future.
The hash function for piece data was changed to SHA-256. One consequence of this is that hashes are 32 bytes instead of 20 bytes. In BitTorrent v2, the info-dictionary is also computed by SHA-256, which poses a compatibility challenge with the DHT and trackers, which have protocols that expect 20 byte hashes. To handle this, DHT- and tracker announces and lookups for v2 torrents use the SHA-256 info-hash truncated to 20 bytes.
The Tree download torrent
This was one of the original rationales for creating a v2 protocol to begin with. It means that fundamentally a v2 torrent will be identified by a different hash than a v1 torrent, which would always create a separate swarm, even when sharing the same files. More on this later, under backwards compatibility.
In BitTorrent v1, pieces are hashed and the resulting hashes are included in the .torrent file/metadata (in the info-dictionary). In most cases, the piece hashes is the bulk of the size of .torrent files. To keep the .torrent file size within reason for large files, the piece size can be increased, meaning each hash represents a larger portion of the file. A consequence of large piece sizes is that if a hash fails, one has to re-download a larger portion of the file, until the piece passes the hash check.
An old idea to improve both of these metrics is to use merkle hash trees to represent the piece hashes (originally implemented in tribler). This keeps .torrent files small because all you need is the root hash of the tree. BitTorrent v2 uses merkle hash trees for pieces (but a different protocol that the one tribler implemented). This has the following advantages:
v2 torrents address this by using a more efficient encoding for the directory structure, with less duplication. Instead of a flat list, the directory structure is stored as a tree (using bencoded dictionaries). This results in directory names only being mentioned once. For example:
A .torrent file is a tree structure encoded with bencoding. In bencoding there are a few cases of single values with multiple possible encodings. An integer could be encoded with leading zeros or not, 0 could be encoded as negative 0. Those encodings have always been illegal, but parsers have historically been lenient and accepted them. Perhaps the most common example is how the keys in dictionaries are required to be sorted lexicographically. However, some torrent creators have failed to sort them, so clients have accepted them.
A hybrid torrent has two info-hashes, one v1 SHA-1 hash one (possibly truncated) SHA-256 hash. This forms two swarms, or a segregated swarm. libtorrent marks peers as supporting v2 or not. This information is also relayed via a new peer exchange (PEX) flag.
BitTorrent is a protocol for distributing files. It identifies contentby URL and is designed to integrate seamlessly with the web. Itsadvantage over plain HTTP is that when multiple downloads of the samefile happen concurrently, the downloaders upload to each other, makingit possible for the file source to support very large numbers ofdownloaders with only a modest increase in its load.
The info-hash must be the hash of the encoded form as foundin the .torrent file, which is identical to bdecoding the metainfo file,extracting the info dictionary and encoding it if and only if thebdecoder fully validated the input (e.g. key ordering, absence of leading zeros).Conversely that means implementations must either reject invalid metainfo filesor extract the substring directly.They must not perform a decode-encode roundtrip on invalid data.
Tracker responses are bencoded dictionaries. If a tracker responsehas a key failure reason, then that maps to a humanreadable string which explains why the query failed, and no other keysare required. Otherwise, it must have two keys: interval,which maps to the number of seconds the downloader should wait betweenregular rerequests, and peers. peers maps toa list of dictionaries corresponding to peers, each ofwhich contains the keys peer id, ip, andport, which map to the peer's self-selected ID, IPaddress or dns name as a string, and port number, respectively. Notethat downloaders may rerequest on nonscheduled times if an eventhappens or they need more peers.
The peer protocol refers to pieces of the file by index asdescribed in the metainfo file, starting at zero. When a peer finishesdownloading a piece and checks that the hash matches, it announcesthat it has that piece to all of its peers.
Data transfer takes place whenever one side is interested and theother side is not choking. Interest state must be kept up to date atall times - whenever a downloader doesn't have something theycurrently would ask a peer for in unchoked, they must express lack ofinterest, despite being choked. Implementing this properly is tricky,but makes it possible for downloaders to know which peers will startdownloading immediately if unchoked.
When data is being transferred, downloaders should keep severalpiece requests queued up at once in order to get good TCP performance(this is called 'pipelining'.) On the other side, requests which can'tbe written out to the TCP buffer immediately should be queued up inmemory rather than kept in an application-level network buffer, sothey can all be thrown out when a choke happens.
Next comes the 20 byte truncated infohash. If both sides don't send the same value,they sever the connection. The one possible exception is if a downloaderwants to do multiple downloads over a single port, they may wait forincoming connections to give a download hash first, and respond withthe same one if it's in their list.
After the download hash comes the 20-byte peer id which is reportedin tracker requests and contained in peer lists in trackerresponses. If the receiving side's peer id doesn't match the one theinitiating side expects, it severs the connection.
'bitfield' is only ever sent as the first message. Its payload is abitfield with each index that downloader has sent set to one and therest set to zero. Downloaders which don't have anything yet may skipthe 'bitfield' message. The first byte of the bitfield corresponds toindices 0 - 7 from high bit to low bit, respectively. The next one8-15, etc. Spare bits at the end are set to zero.
'hash request' messages contain a pieces root, base layer, index, length,and proof layers. The pieces root is the root hash of a file.The base layer defines the lowest requested layer of thehash tree. It is the number of layers above the leaf layer that the hash listshould start at. A value of zero indicates that leaf hashes arerequested. Clients are only required to support setting the base layerto the leaf and piece layers. Index is the offset in hashesof the first requested hash in the base layer.Index MUST be a multiple of length, this includes zero.Length is the number of hashes to include from the base layer.Length MUST be equal-to-or-greater-than two and a power of two.Length SHOULD NOT be greater than 512. Proof layers is thenumber of ancestor layers to include. Note that the limits imposed onindex and length above mean that at-most one uncle hash is neededfrom each proof layer.Hash requests MUST be answered with either a 'hashes' or 'hash reject' message.
'cancel' messages have the same payload as request messages. Theyare generally only sent towards the end of a download, during what'scalled 'endgame mode'. When a download is almost complete, there's atendency for the last few pieces to all be downloaded off a singlehosed modem line, taking a very long time. To make sure the last fewpieces come in quickly, once requests for all pieces a givendownloader doesn't have yet are currently pending, it sends requestsfor everything to everyone it's downloading from. To keep this frombecoming horribly inefficient, it sends cancels to everyone else everytime a piece arrives.cancel messages do not relieve the other side from the duty of respondingto a request. They must either send a piece or a reject message as response.
Choking is done for several reasons. TCP congestion control behavesvery poorly when sending over many connections at once. Also, chokinglets each peer use a tit-for-tat-ish algorithm to ensure that they geta consistent download rate.
There are several criteria a good choking algorithm should meet. Itshould cap the number of simultaneous uploads for good TCPperformance. It should avoid choking and unchoking quickly, known as'fibrillation'. It should reciprocate to peers who let itdownload. Finally, it should try out unused connections once in awhile to find out if they might be better than the currently usedones, known as optimistic unchoking.
The currently deployed choking algorithm avoids fibrillation byonly changing who's choked once every ten seconds. It doesreciprocation and number of uploads capping by unchoking the fourpeers which it has the best download rates from and areinterested. Peers which have a better upload rate but aren'tinterested get unchoked and if they become interested the worstuploader gets choked. If a downloader has a complete file, it uses itsupload rate rather than its download rate to decide who tounchoke.
For optimistic unchoking, at any one time there is a single peerwhich is unchoked regardless of its upload rate (if interested, itcounts as one of the four allowed downloaders.) Which peer isoptimistically unchoked rotates every 30 seconds. To give them adecent chance of getting a complete piece to upload, new connectionsare three times as likely to start as the current optimistic unchokeas anywhere else in the rotation.
For interoperability with BEP 3 a torrent can be created to contain the necessarydata for both formats. To do so the 'pieces' field and 'files' or 'length' in the infodictionary must be generated to describe the same data in the same order.Since the old format did not align files to piece boundaries a multifile torrentmust use BEP 47 padding files to achieve identical alignment. 2ff7e9595c
Comments