220 likes | 355 Views
CS217 Advanced Topics in Internet Research Guest Lecture Nikitas Liogkas, 5/11/2006. The BitTorrent content distribution system. Motivation. flash crowd (aka slashdot) effect many clients, few servers Problem: servers cannot handle load Solution: swarming
E N D
CS217 Advanced Topics in Internet Research Guest Lecture Nikitas Liogkas, 5/11/2006 The BitTorrentcontent distribution system
Motivation • flash crowd (aka slashdot) effect • many clients, few servers • Problem: servers cannot handle load • Solution: swarming • clients download pieces of the file from each other • has been proven to have good scaling and performance properties
Presentation outline • Joining the system • Encoding / metadata file • Tracker protocol • Peer wire protocol • Piece selection • Peer selection • Client implementations • Resources
2 4 3 1 tracker website seed/leecher new leecher Joining a torrent Peers divided into: • seeds:have the entire file • leechers: still downloading metadata file join peer list datarequest 1. obtain the metadata file (out of band) 2. contact the tracker 3. obtain a peerlist (contains seeds & leechers) 4. contact peers from that list for data
leecher A seed leecher B leecher C Exchanging data I have ! ●verify pieces using hashes ●download sub-pieces (blocks) in parallel ● advertise received pieces to the entire peer list ● interested: need pieces that a given peer has
Bencoding • encoding format of all exchanged messages • four types • byte strings • integers • lists • dictionaries (mapping keys to values) • examples • 4:spam represents the string “spam” • i10e represents the integer 10
Metadata file structure • contains information necessary to contact the tracker and describes the files in the torrent • announce URL of tracker • file name • file length • piece length (typically 256KB) • SHA-1 hashes of pieces for verification • also creation date, comment, creator, …
Tracker protocol • communicates with clients via HTTP/HTTPS • client GET request • info_hash: uniquely identifies the file • peer_id: chosen by and uniquely identifies the client • client IP and port • numwant: how many peers to return (defaults to 50) • stats: bytes uploaded, downloaded, left • tracker GET response • interval: how often to contact the tracker • list of peers, containing peer id, IP and port • stats: complete, incomplete • tracker-less mode; based on the Kademlia DHT
Presentation outline • Joining the system • Encoding / metadata file • Tracker protocol • Peer wire protocol • Piece selection • Peer selection • Client implementations • Resources
Peer wire protocol • implemented directly on top of TCP • messages • handshake (maybe with bitfield) • keep-alive • choke / unchoke • interested / not interested • have (advertisement of a newly acquired piece) • request / piece • cancel (only used in “endgame mode”) • port (used in tracker-less mode)
Piece selection • when downloading starts: choose at random • get complete pieces as quickly as possible • obtain something to offer to others • after we have 4 pieces: pick (local) rarest first • achieves the fastest replication of rare pieces • obtain something of value • only get unique pieces from the seed • endgame mode • defense against the “last-block problem” • send requests for missing sub-pieces to all peers in our peer list • send cancel messages upon receipt of a sub-piece
Last-block problem • at the end of the download, a peer may have trouble finding the few missing pieces • based on anecdotal evidence • other proposals • network coding [Gkantsidis et al., Infocom’05] • prefer to upload to peers with similar file completeness; unfair for the peers having most of the pieces [Tian et al., Infocom’06]
Last-block problem – a myth? • is it a problem after all? • figure from [Legout et al., INRIA-TR-2006], with permission
leecher A seed leecher B leecher C Peer selection - unchoking • periodically (typically every 10 seconds) calculate data-receiving rates • upload to (unchoke) the fastest • constant number of unchoking slots • based on the “tit-for-tat” strategy
Optimistic unchoking • periodically select a peer at random and upload to it • typically every 3 unchoking rounds (30 seconds) • multi-purpose mechanism • allow bootstrapping of new clients • continuously look for the fastest partners • robustness: every peer has a non-zero chance of interacting with any other peer
Seed unchoking • old algorithm • unchoke the fastest leechers • problem: fastest peers may monopolize seeds • new algorithm • periodically sort all leechers according to their last unchoke time • prefer the most recently unchoked leechers; on a tie, prefer the fastest • (presumably) achieves equal spread of seed bandwidth
leecher A seed leecher B leecher C tracker Downloading only from seeds new listrequest peer list ● repeatedly query the tracker for peer lists ● distinguish the seeds, and receive data from them ● violates fairness model; may be harmful to honest peers
Rate- vs. volume-based selection • Proponents of rate-based decisions: [Cohen, P2PECON’03], and[INRIA TR’2006] • Proponents of volume-based decisions:[Bharambe et al., MSR-TR-2005],[Gkantsidis et al., Infocom’05], [Jun et al., P2PECON’05], andeDonkey file-sharing system • No clear winner yet!
Client implementations • mainline: written in Python; right now, the only one employing the new seed unchoking algorithm • Azureus: the most popular, written in Java; implements a special protocol between clients(e.g. peers can exchange peer lists) • other popular clients: ABC, BitComet, BitLord, BitTornado, μTorrent, Opera browser • various non-standard extensions • retaliation mode: detect compromised/malicious peers • anti-snubbing: ignore a peer who ignores us • super seeding: seed masquerading as a leecher
Resources #1 • Basic BitTorrent mechanisms [Cohen, P2PECON’03] • BitTorrent specification Wikihttp://wiki.theory.org/BitTorrentSpecification • Measurement studies [Izal et al., PAM’04], [Pouwelse et al., Delft TR 2004 and IPTPS’05], [Guo et al., IMC’05], and[Legout et al., INRIA-TR-2006]
Resources #2 • Theoretical analysis and modeling [Qiu et al., SIGCOMM’04], and[Tian et al., Infocom’06] • Simulations [Bharambe et al., MSR-TR-2005] • Sharing incentives and exploiting them [Shneidman et al., PINS’04],[Jun et al., P2PECON’05], and[Liogkas et al., IPTPS’06]
Conclusion and food for thought • BitTorrent is fast and robust • Yet, many parameters are arbitrarily set • number of unchoking slots • unchoking round duration • size of pieces / sub-pieces • What can we learn from BitTorrent for the design of future P2P content distribution protocols?