90 likes | 106 Views
Explore the architecture, algorithms, and benefits of trading data deeds for replicating digital collections, ensuring reliability and autonomy in a decentralized network. Discover the challenges and solutions in securing and optimizing data replication in a peer-to-peer trading system.
E N D
www-diglib.stanford.edu www-db.stanford.edu/peers/ Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University cooperb@stanford.edu
Motivation Data: easy to create, hard to preserve • Broken tapes • Human deletions • Going out of business Goal • Reliable replication of digital collections • Given that: Resources are limited, sites are autonomous, not all sites are equal • Metric • Reliability • Not necessarily “efficiency”
Collection 1 Collection 1 Collection 1 Collection 1 Deed for space Deeds For use by: Library of Congress or for transfer 623 gigabytes • A right to use space at another site • Bookkeeping mechanism for trades • Used, saved, split, or transferred A B A B A B “Thanks” “623 GB?” “Okay” 623 GB 623 GB 623 GB Trading algorithm: 1. Sites trade deeds 2. Sites exercise deeds to replicate collections
Internet Local archive Remote archive Users Replication architecture Users Service layer Reliability layer Reliability layer Data trading SAV Archive SAV Archive InfoMonitor Architecture developed with Arturo Crespo Archived Archived data data Filesystem Data trading is the replication component of a digital archive Geographic dispersal of data protects from a variety of failures
High reliability • Framework for replication • Site autonomy • Make local decisions • Fairness • Contribute more = more reliability • Must contribute resources • Adapts to dynamic situation • Just make new trades Benefits of trading • Central control • Loses autonomy • Still must adapt to dynamicity Other solutions • Client based • Rare collections not protected • Random • We can do better • Especially with limited resources
Decisions facing an archive How to find the best decisions? Trading simulator • Who to trade with • How much to trade • When to ask for a trade • Providing space • Advertising space • Picking a number of copies • Coping with varying site reliabilities • What to do with acquired resources • How to deliver other services 1. Generate scenarios 2. Simulate trading with different policies 3. Compare reliability Many, many degrees of freedom!
B Example: Advertising policy A “I have 120 GB” 120 GB Space fractional policy B A “I have 60 GB” 60 GB Data proportional policy B A 40 GB Data “I have 40 GB” 40 GB Data proportional is best policy • Reserves space for future
Extensions Some sites > others • More reliable • Better reputation • “Good friends” Clustering: trade with trusted partners ClosestReliability: trade with other sites that are as reliable as you MostReliable: trade with the most reliable site Freedom to negotiate “How much do I pay for 100 GB of your space?” • “Bid trading” • Choose bid based on local situation A “120 GB” “80 GB” “95 GB”
Malicious sites • Secure services • Publish: Makes copies to survive failures • Search: Find documents • Retrieve: Get a copy of a document • Challenges • Attacker may delete copy • Attacker may provide fake search results • Attacker may provide altered document • Attacker may disrupt message routing • … • Joint work with Mayank Bawa and Neil Daswani