170 likes | 340 Views
Alexey Zagalsky. Introduction to BitTorrent. Based on data from Wikipedia slides from “ Introduction to BitTorent ” by Arvid Norberg slides from” BitTorrent Background ” by Hilel. What is BitTorrent.
E N D
AlexeyZagalsky Introduction to BitTorrent Based on data from Wikipedia slides from “Introduction to BitTorent” by ArvidNorberg slides from”BitTorrent Background” by Hilel
What is BitTorrent • BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data • It has been estimated that it accounted for roughly 27% to 55% of all Internet traffic (depending on geographical location) as of February 2009. • Programmer Bram Cohen designed the protocol in April 2001
Reasons for Adoption • Better performance through “pull-based” transfer • Slow nodes don’t bog down other nodes • Allows uploading from hosts that have downloaded parts of a file • Practical Reasons (perhaps more important!) • Working implementation with simple well-defined interfaces for plugging-in new content • Many recent competitors got sued / shut down • Napster, Kazaa
How It Works • The file to be distributed is split up in pieces and a SHA-1 hash is calculated for each piece
How It Works • A metadata file (.torrent) is distributed to all peers • Usually via Web, Email, etc… • The metadata contains: • SHA-1 hashes of all pieces • Tracker reference (URL) • Piece Length: usually 256KB • is it better smaller or bigger pieces ?
How It Works • The user makes the file itself available through a BitTorrent node acting as a seed • The Tracker is a central server keeping a list of all peers participating in the swarm • A swarm is the set of peers that are participating in distributing the same file • A peer joins a swarm by asking the tracker for a peer list and connects to those peers
Terminology • A downloader is any peer that does not have the entire file and is downloading the file • A leecher is: • A peer who has a negative effect on the swarm by having a very poor share ratio • A downloader • A seeder is a peer that has an entire copy of the torrent and offers it for upload
Goals • Efficiency • Ability to download from many peers yields fast downloads • Minimize piece overlap among peers • Download random pieces • Rarest Firstalgorithm • Reliability • Tolerant to dropping peers • Ability to verify data integrity (SHA-1 hashes)
Rarest First • The piece picking algorithm used in BitTorrent is called Rarest First • To maximize the distributed copies, maximize the availability of the rarest pieces • Picks a random piece from the set of rarest pieces • No peer has global knowledge of piece availability, it is approximated by the availability among neighbors
Incentive to Share Policies to determine to whom to send data: • Tit-for-Tat • Upload to whoever uploads the most to you • "Survival of the fittest“ • Theoretically increases performance by encouraging fast peers to upload to you and giving them even more pieces to upload to others • May result in suboptimal situations • Optimistic Unchoking • In hope of discovering better partners • To ensure that newcomers get a chance to join the swarm
Tit-for-tat as Incentive to Upload • Want to encourage all peers to contribute • Peer A is said to choke peer B if it (A) decides not to upload to B • Each peer (say A) unchokes at most 4 interested peers at any time • The three with the largest upload rates to A • Where the tit-for-tat comes in • Another randomly chosen (Optimistic Unchoke) • To periodically look for better choices
Limitations • Content unavailability • Although swarming scales well to tolerate flash crowds for popular content, it is less useful for unpopular content • The leech problem • A user may often choose to leave the swarm as soon as they have a complete copy of the file they are downloading • Pieces not downloaded in sequential order (think VOD)
Trackerless Torrents • Common problems with Trackers: • Single point of failure • Solutions: • Multiple Trackers (splits swarms) • DHT
Distributed Hash Table • Works as a hash table with SHA-1 hashes as keys • The key is the info-hash, the hash of the metadata • It uniquely identifies a torrent • The data is a peer list of the peers in the swarm