1 / 44

Peer-to-peer archival data trading

Peer-to-peer archival data trading. Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University. Problem: Fragile Data. Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business. Replication-based preservation.

Download Presentation

Peer-to-peer archival data trading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

  2. Problem: Fragile Data • Data: easy to create, hard to preserve • Broken tapes • Human deletions • Going out of business

  3. Replication-based preservation

  4. Replication-based preservation

  5. Motivation • Several systems use replication • Preserve digital collections • SAV, others • Archival part of digital library • Individual organizations cooperate • Not a lot of money to spend

  6. Goal • Reliable replication of digital collections • Given that • Resources are limited • Sites are autonomous • Not all sites are equal • Traditional methods • Central control • Random • Replicate popular • Metric • Reliability • Not necessarily “efficiency”

  7. Our solution • Data trading • “I’ll store a copy of your collection if you’ll store a copy of mine” • Sites make local decisions • Who to trade with • How many copies to make • How much space to provide • Etc.

  8. Trading network B C A G D H E F • A series of binary, peer-to-peer trading links

  9. Architecture Internet Archived Archived data data Local archive Remote archive Users Users Service layer Reliability layer Reliability layer SAV Archive SAV Archive InfoMonitor Filesystem This architecture developed with Arturo Crespo

  10. Overview • Trading model • Trading algorithm • Optimizing (and simulating) trading • Some results • Some stuff we are still working on

  11. Trading model

  12. Trading model • Archive site: an autonomous archiving provider

  13. Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials

  14. Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections

  15. Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections • Archiving client: deposit and retrieve materials

  16. Trading model • Archive site: an autonomous archiving provider • Digital collection: a set of related digital materials • Archival storage: stores locally and remotely owned digital collections • Archiving client: deposit and retrieve materials • Data reliability: probability that data is not lost

  17. Deeds • A right to use space at another site • Bookkeeping mechanism for trades • Used, saved, split, or transferred • Trading algorithm • Sites trade deeds • Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University

  18. Deed trading A B C Collection 3 Collection 3 Collection 1 Collection 1 Collection 2 Collection 2

  19. The challenge C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3

  20. The challenge C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3

  21. Alternative solutions Are there other ways besides trading?

  22. Other solutions: central control C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3

  23. Other solutions: client-based C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3

  24. Other solutions: random C A B Collection 2 Collection 2 Collection 1 Collection 1 Collection 3 Collection 3 Collection 3

  25. Why is trading good? B C A G D H E F • High reliability • Framework for replication • Site autonomy • Make local decisions • No submission to external authority • Fairness • Contribute more = more reliability • Must contribute resources

  26. Decisions facing an archive • Who to trade with • How much to trade • When to ask for a trade • Providing space • Advertising space • Picking a number of copies • Coping with varying site reliabilities • What to do with acquired resources • How to deliver other services Many many degrees of freedom!

  27. Our approach • Define a basic trading protocol • Deed trading • Assume all sites follow same rules • Basic system for trading • Extend: not all sites are equal • Some are more reliable or trusted • Extend: sites have freedom to negotiate • Bid trading • Extend: some sites are malicious • Ensure documents survive despite evildoers • For each model, what policies are best?

  28. How do we evaluate policies? • Trading simulator • Generate scenario • Simulate trading with different policies • Evaluate reliability for each policy • Compare each policy

  29. Simulation parameters

  30. Reliability • Site reliability • Will a site fail? • Example: 0.9 = 10% chance of failure • Data reliability • How safe is the data? • Despite site failures • Example: 320 year MTTF

  31. Basic trading approach • How does trading work? • Assuming all sites follow “the rules” • Example: advertising policy B A “Let’s trade. How much space do you have?”

  32. Advertising policy B A “I have 120 GB” 120 GB Space fractional policy B A “I have 60 GB” 60 GB Data proportional policy B A 40 GB Data “I have 40 GB” 40 GB

  33. Result

  34. Extend: some sites > others • May prefer certain sites • More reliable • Better reputation • Part of same system • Example: who to trade with? A ? ? ?

  35. Who to trade with?

  36. Extend: freedom to negotiate • Bid for trades “How much do I pay for 100 GB of your space?” A “120 GB” “80 GB” “95 GB”

  37. Bid trading • Questions • When do I call auctions? • How much do I bid? • Can I take advantage of the system by being clever?

  38. Extend: some sites are malicious • Secure services • Publish: Makes copies to survive failures • Search: Find documents • Retrieve: Get a copy of a document • Challenges • Attacker may delete copy • Attacker may provide fake search results • Attacker may provide altered document • Attacker may disrupt message routing • … Joint work with Mayank Bawa and Neil Daswani

  39. Current and future work • Access • Support searching over collections • Distribute indexes via trading • Prototype implementation • Basic SAV architecture implemented • Trading protocol/policies must be added • Develop security techniques further

  40. Current and future work • Other topics of interest • Designing peer-to-peer primitives • Building other p2p services • Other ways of acquiring data • How to archive active systems • Semantic archiving • Managing “format obsolescence” • Finding data once it is archived

  41. Other parts of SAV project • SAV data model • Write-once objects • Signature-based naming • How to get objects into SAV • InfoMonitor – filesystem • Other inputs (Web, DBMS, etc.) • Modeling archival repositories • Arturo Crespo • Choose best components and design

  42. Related work • Peer-to-peer replication • SAV, Intermemory, LOCKSS, OceanStore… • Fault tolerant systems • RAID, mirrored disks, replicated databases • Caching systems (Andrew, Coda) • Deep storage (Tivoli) • Barter/auction based systems • ContractNet • Distributed resource allocation • File Allocation Problem

  43. Conclusion B C A G D H E F • Important, exciting area • Preservation critical • Difficult to accomplish • Many decisions are ad hoc today • An effective framework is needed • Scientific evaluation of decisions • Trading networks replicate data • Model for trading networks • Trading algorithm • Simulation results

  44. For more information • cooperb@stanford.edu • http://www-diglib.stanford.edu/

More Related