440 likes | 457 Views
Peer-to-peer archival data trading. Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University. Problem: Fragile Data. Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business. Replication-based preservation.
E N D
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University
Problem: Fragile Data • Data: easy to create, hard to preserve • Broken tapes • Human deletions • Going out of business
Motivation • Several systems use replication • Preserve digital collections • SAV, others • Archival part of digital library • Individual organizations cooperate • Not a lot of money to spend
Goal • Reliable replication of digital collections • Given that • Resources are limited • Sites are autonomous • Not all sites are equal • Traditional methods • Central control • Random • Replicate popular • Metric • Reliability • Not necessarily “efficiency”
Our solution • Data trading • “I’ll store a copy of your collection if you’ll store a copy of mine” • Sites make local decisions • Who to trade with • How many copies to make • How much space to provide • Etc.
Trading network B C A G D H E F • A series of binary, peer-to-peer trading links
Architecture Internet Archived Archived data data Local archive Remote archive Users Users Service layer Reliability layer Reliability layer SAV Archive SAV Archive InfoMonitor Filesystem This architecture developed with Arturo Crespo
Overview • Trading model • Trading algorithm • Optimizing (and simulating) trading • Some results • Some stuff we are still working on
Trading model • Archive site: an autonomous archiving provider
Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials
Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections
Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections • Archiving client: deposit and retrieve materials
Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections • Archiving client: deposit and retrieve materials • Data reliability: probability that data is not lost
Deeds • A right to use space at another site • Bookkeeping mechanism for trades • Used, saved, split, or transferred • Trading algorithm • Sites trade deeds • Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University
Deed trading A B C Collection 3 Collection 3 Collection 1 Collection 1 Collection 2 Collection 2
The challenge C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3
The challenge C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3
Alternative solutions Are there other ways besides trading?
Other solutions: central control C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3
Other solutions: client-based C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3
Other solutions: random C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3
Why is trading good? B C A G D H E F • High reliability • Framework for replication • Site autonomy • Make local decisions • No submission to external authority • Fairness • Contribute more = more reliability • Must contribute resources
Decisions facing an archive • Who to trade with • How much to trade • When to ask for a trade • Providing space • Advertising space • Picking a number of copies • Coping with varying site reliabilities • What to do with acquired resources • How to deliver other services Many many degrees of freedom!
Our approach • Define a basic trading protocol • Deed trading • Assume all sites follow same rules • Basic system for trading • Extend: not all sites are equal • Some are more reliable or trusted • Extend: sites have freedom to negotiate • Bid trading • Extend: some sites are malicious • Ensure documents survive despite evildoers • For each model, what policies are best?
How do we evaluate policies? • Trading simulator • Generate scenario • Simulate trading with different policies • Evaluate reliability for each policy • Compare each policy
Reliability • Site reliability • Will a site fail? • Example: 0.9 = 10% chance of failure • Data reliability • How safe is the data? • Despite site failures • Example: 320 year MTTF
Basic trading approach • How does trading work? • Assuming all sites follow “the rules” • Example: advertising policy B A “Let’s trade. How much space do you have?”
Advertising policy B A “I have 120 GB” 120 GB Space fractional policy B A “I have 60 GB” 60 GB Data proportional policy B A 40 GB Data “I have 40 GB” 40 GB
Extend: some sites > others • May prefer certain sites • More reliable • Better reputation • Part of same system • Example: who to trade with? A ? ? ?
Extend: freedom to negotiate • Bid for trades “How much do I pay for 100 GB of your space?” A “120 GB” “80 GB” “95 GB”
Bid trading • Questions • When do I call auctions? • How much do I bid? • Can I take advantage of the system by being clever?
Extend: some sites are malicious • Secure services • Publish: Makes copies to survive failures • Search: Find documents • Retrieve: Get a copy of a document • Challenges • Attacker may delete copy • Attacker may provide fake search results • Attacker may provide altered document • Attacker may disrupt message routing • … Joint work with Mayank Bawa and Neil Daswani
Current and future work • Access • Support searching over collections • Distribute indexes via trading • Prototype implementation • Basic SAV architecture implemented • Trading protocol/policies must be added • Develop security techniques further
Current and future work • Other topics of interest • Designing peer-to-peer primitives • Building other p2p services • Other ways of acquiring data • How to archive active systems • Semantic archiving • Managing “format obsolescence” • Finding data once it is archived
Other parts of SAV project • SAV data model • Write-once objects • Signature-based naming • How to get objects into SAV • InfoMonitor – filesystem • Other inputs (Web, DBMS, etc.) • Modeling archival repositories • Arturo Crespo • Choose best components and design
Related work • Peer-to-peer replication • SAV, Intermemory, LOCKSS, OceanStore… • Fault tolerant systems • RAID, mirrored disks, replicated databases • Caching systems (Andrew, Coda) • Deep storage (Tivoli) • Barter/auction based systems • ContractNet • Distributed resource allocation • File Allocation Problem
Conclusion B C A G D H E F • Important, exciting area • Preservation critical • Difficult to accomplish • Many decisions are ad hoc today • An effective framework is needed • Scientific evaluation of decisions • Trading networks replicate data • Model for trading networks • Trading algorithm • Simulation results
For more information • cooperb@stanford.edu • http://www-diglib.stanford.edu/