E N D
1. Understanding BitTorrent Iqbal Mohomed
3. Simple Solution: One Big Server Make the file available on a central server
Each client downloads file from this server
Problems
Solution does not scale very well
With a large number of clients, the server’s resources get overwhelmed
4. The Brilliance of Napster: P2P In the original Napster, nodes connected to a central server and gave it a listing of all the files they had.
Nodes relay searches to the central server, which performs them locally
The actual file transfer occurs peer-to-peer
The big weakness of this approach was that the directory server was a single point of failure
5. The Gnutella Solution In Gnutella, all nodes are true peers
This gets rid of the single point of failure problem
The new problem is efficiency and scalability
Specifically, searches go across a large number of nodes, generating a massive amount of traffic
There is a larger compromise in privacy as peers see search queries
6. The FastTrack Network aka Kazaa
Combines the Napster and Gnutella approaches
Nodes connect to super-peers that act as the directory server in Napster
The super-peers connect to each other similar to Gnutella
This solution is working very well in practice
7. Enter BitTorrent Released in the summer of 2001
Uses basic ideas from game theory to largely eliminate the free-rider problem
All previous systems could not deal with this problem well
Makes no strong guarantees unlike DHTs
It is working extremely well in practice, unlike DHTs ?
8. Basic Idea Chop file into many pieces
Replicate DIFFERENT pieces on different peers as soon as possible
As soon as a peer has a complete piece, it can trade it with other peers
Hopefully, we will be able to assemble the entire file at the end
9. Basic Components Seed
Peer that has the entire file
Leacher
Peer that has an incomplete copy of the file
A Torrent file
Passive component
Files are typically fragmented into 256KB pieces
The torrent file lists SHA1 hashes of all the pieces to allow peers to verify integrity
Typically hosted on a web server
A Tracker
Active component
Allows peers to find each other
Returns a random list of peers
10. Operation
11. Pieces and Sub-Pieces A piece is broken into sub-pieces ... typically 16KB in size
Policy: Until a piece is assembled, only download sub-pieces for that piece
This policy lets complete pieces assemble quickly
12. Pipelining When transferring data over TCP, it is critical to always have several requests pending at once, to avoid a delay between pieces being sent
BitTorrent breaks pieces into sub-pieces
At any point in time, some number, typically 5, are requested simultaneously
Every time a sub-piece arrives, a new request is sent
This scheme has been found to saturate most connections in practice
13. Piece Selection The order in which pieces are selected by different peers is critical for good performance
If a bad algorithm is used, we could end up in a situation where every peer has all the pieces that are currently available and none of the missing ones
If the original seed is taken down, the file cannot be completely downloaded!
14. Random First Piece Initially, a peer has nothing to trade
Important to get a complete piece ASAP
Rare pieces are typically available at fewer peers, so downloading a rare piece initially is not a good idea
Policy: Select a random piece of the file and download it
15. Rarest Piece First Policy: Determine the pieces that are most rare among your peers and download those first
This ensures that the most common pieces are left till the end to download
Rarest first also ensures that a large variety of pieces are downloaded from the seed
16. Endgame Mode Policy: When all the sub-pieces that a peer doesn’t have are actively being requested, these are requested from EVERY peer
When the sub-piece arrives, the replicated requests are cancelled
This ensures that a download doesn’t get prevented from completion due to a single peer with a slow transfer rate
Some bandwidth is wasted, but in practice, this is not too much
17. Choking One of BitTorrent’s most powerful idea is the choking mechanism
It ensures that nodes cooperate and eliminates the free-rider problem
Cooperation involves uploaded sub-pieces that you have to your peer
Choking is a temporary refusal to upload; downloading occurs as normal
Connection is kept open so that setup costs are not borne again and again
Based on game-theoretic concepts
Tit-for-tat strategy in Repeated Games
18. Prisoner’s Dilemma
19. Repeated Games Over time, more complex strategies can evolve
For instance, Tit-for-tat
Do onto others as they do onto you
If someone cheats, you must retaliate back
Have a recovery mechanism to ensure eventual cooperation
20. Choking Algorithm Goal is to have several bidirectional connections running continuously
Upload to peers who have uploaded to you recently
Unutilized connections are uploaded to on a trial basis to see if better transfer rates could be found using them
21. Choking Specifics A peer always unchokes a fixed number of its peers (default of 4)
Decision to choke/unchoke done based on current download rates, which is evaluated on a rolling 20-second average
Evaluation on who to choke/unchoke is performed every 10 seconds
This prevents wastage of resources by rapidly choking/unchoking peers
Supposedly enough for TCP to ramp up transfers to their full capacity
Which peer is the optimistic unchoke is rotated every 30 seconds
22. Anti-Snubbing Policy: When over a minute has gone by without receiving a single sub-piece from a particular peer, do not upload to it except as an optimistic unchoke
A peer might find itself being simultaneously choked by all its peers that it was just downloading from
Download will lag until optimistic unchoke finds better peers
Policy: If choked by everyone, increase the number of simultaneous optimistic unchokes to more than one
23. Upload-Only mode Once download is complete, a peer has no download rates to use for comparison nor has any need to use them
The question is, which nodes to upload to?
Policy: Upload to those with the best upload rate.
This ensures that pieces get replicated faster
Also, peers that have good upload rates are probably not being served by others
24. References "BitTorrent Economics Paper" , Bram Cohen
"BitTorrent protocol specification" , Bram Cohen
"BitTorrent Resource Availability Analysis" , Brian Greinke and James Hsia. (Rice)
"Dissecting BitTorrent: Five Months in a Torrent's Lifetime" , M. Izal, G. Urvoy-Keller, E.W. Biersack, P.A. Felber, A. Al Hamra, and L.Garc es-Erice. (Institut Eurecom, France)