1 / 40

Outline for Today’s Lecture

Explore mechanisms, threats, and solutions in security environments with a focus on peer-to-peer file systems. Learn about data loss, cryptography basics, digital signatures, and the challenges of Byzantine Generals. Delve into the issues and benefits of P2P systems.

hjohnson
Download Presentation

Outline for Today’s Lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline for Today’s Lecture Administrative: Objective: • Peer-to-peer file systems • Mechanisms employed • Issues • Some examples

  2. The Security EnvironmentThreats Security goals and threats

  3. Intruders Common Categories • Casual prying by nontechnical users • Snooping by insiders • Determined attempt to make trouble (or personal gain) • Commercial or military espionage

  4. Accidental Data Loss Common Causes • Acts of God • fires, floods, wars • Hardware or software errors • CPU malfunction, bad disk, program bugs • Human errors • data entry, wrong tape mounted, rm *

  5. Reliability Mechanisms(Redundancy) • Replication of data, geographically distributed • As simple as backups • First-class replication (Coda) • Voting schemes • Error detection-correction • Erasure codes (encode n blocks into >n blocks, requiring r blocks to recover original content of original n) • Parity bits, checksums

  6. Basics of Cryptography Relationship between the plaintext and the ciphertext

  7. Secret-Key Cryptography • Secret-key crypto called symmetric-key crypto • If keys are long enough there are OK algorithms • Secret key must be shared by both parties

  8. Public-Key Cryptography • All users pick a public key/private key pair • publish the public key • private key not published • Public key is (usually*) the encryption key • Private key is (usually*) the decryption key • RSA

  9. One-Way Functions • Function such that given formula for f(x) • easy to evaluate y = f(x) • But given y • computationally infeasible to find x • Example: Hash functions – produce fixed size result • MD5 • SHA

  10. Digital Signatures • Computing a signature block • Hash is fixed length – apply private key as encryption key* • What the receiver gets • Use public key as decryption key* on signature block to get hash back • Compute the hash of document part • Do these match? • Assumes E(D(x)) = x when we usually want D(E(x))=x • Public key must be known by receiver somehow – certificate (b)

  11. Distributing Public Keys • Certificate authority • Trusted 3rd party • Their public key known • Send name and public key, digitally signed by ca

  12. Byzantine Generals Problem Reaching consensus among geographically separated (distributed) players if some of them are compromised. • Generals of army units need to agree on a common plan of attack (consensus) • Traitorous generals will lie (faulty or malicious) • Generals communicate by sending messages directly general-to-general through runners between units (they won’t all see the same intell) • Solutions are for all loyal generals to reach consensus, in spite of liars (up to some % of generals being bad)

  13. Solution with Digital Sigs • Iteratively execute “rounds” of message exchanges • As each message passes by, the receiving general digitally signs it and forwards it on. • Each General maintains the set of orders received • Inconsistent orders indicate traitor

  14. Peer-to-peer File Systems

  15. Problems with Centralized Storage Server Farms • Weak availability: • Susceptible to point failures and DoS attacks • Management overhead • Data often manually partitioned to obtain scale • Management and maintenance large fraction of cost • Per-application design (e.g., GoogleOS) • High hurdle for new applications • Don’t leverage the advent of powerful clients • Limits scalability and availability Slides from Shenker and Stoica, UCB

  16. What is a P2P system? Node Node Node • A distributed system architecture: • No centralized control • Nodes are symmetric in function • Large number of (perhaps) server-quality nodes • Enabled by technology improvements Internet Node Node Slides from Shenker and Stoica, UCB

  17. P2P as Design Style • Resistant to DoS and failures • Safety in numbers, no single point of attack or failure • Self-organizing • Nodes insert themselves into structure • Need no manual configuration or oversight • Flexible: nodes can be • Widely distributed or co-located • Powerful hosts or low-end PCs • Trusted or unknown peers Slides from Shenker and Stoica, UCB

  18. Issues • Goal is to have no centralized server and to utilize desktop-level idle resources. • Trust – privacy, security, data integrity • Using untrusted hosts • Availability – • Using lower “quality” resources • Using machines that may regularly go off-line • Fairness – freeloaders who just use and don’t contribute any resources • Using voluntarily contributed resources

  19. Issues • Goal is to have no centralized server and to utilize desktop-level idle resources. • Trust – privacy, security, data integrity • Using untrusted hosts -- crypto solutions • Availability – • Using lower “quality” resources -- replication • Using machines that may regularly go off-line • Fairness – freeloaders who just use and don’t contribute any resources • Using voluntarily contributed resources – use economic incentives

  20. What Interface? • Challenge for P2P systems: finding content • Many machines, must find one that holds file • Essential task: Lookup(key) • Given key, find host (IP) that has file with that key • Higher-level interface: Put()/Get() • Easy to layer on top of lookup() • Allows application to ignore details of storage • System looks like one hard disk • Good for some apps, not for others Slides from Shenker and Stoica, UCB

  21. Distributed Hash Tables vs Unstructured P2P • DHTs good at: • exact match for “rare” items • DHTs bad at: • keyword search, etc. [can’t construct DHT-based Google] • tolerating extreme churn • Gnutella etc. good at: • general search • finding common objects • very dynamic environments • Gnutella etc. bad at: • finding “rare” items Slides from Shenker and Stoica, UCB

  22. …. node node node DHT Layering Distributed application data get (key) put(key, data) Distributed hash table lookup(key) node IP address Lookup service • Application may be distributed over many nodes • DHT distributes data storage over many nodes Slides from Shenker and Stoica, UCB

  23. Two Crucial Design Decisions • Technology for infrastructure: P2P • Take advantage of powerful clients • Decentralized • Nodes can be desktop machines or server quality • Choice of interface: Lookup and Hash Table • Lookup(key) returns IP of host that “owns” key • Put()/Get() standard HT interface • Some flexibility in interface (no strict layers) Slides from Shenker and Stoica, UCB

  24. K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: Overlay Slides from Shenker and Stoica, UCB

  25. K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: put() put(K1,V1) Slides from Shenker and Stoica, UCB

  26. K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: put() put(K1,V1) Slides from Shenker and Stoica, UCB

  27. (K1,V1) K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: put() Slides from Shenker and Stoica, UCB

  28. K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: get() get(K1) Slides from Shenker and Stoica, UCB

  29. K V K V K V K V K V K V K V K V K V K V K V A DHT in Operation: get() get(K1) Slides from Shenker and Stoica, UCB

  30. Key Requirement • All puts and gets for a particular key must end up at the same machine • Even in the presence of failures and new nodes (churn) • This depends on the DHT routing algorithm • Must be robust and scalable Slides from Shenker and Stoica, UCB

  31. Examples CAN Chord Pastry Tapestry In BitTorrent and Coral CDN Keyspace partitioning – ownership of keys split among participating nodes Node has ID and owns keys “close” to its ID by some distance function Hash filename to key Routing in the overlay To node with a closer ID or else it’s mine DHTs

  32. PASTRY Overlay Network • Nodes assigned 1-dimensional IDs in hash space at random (e.g., hash on IP address) • Each node has log n neighbors & maintains routing table • Lookup with fileID kis routed to live node with nodeID close to k k Route k

  33. PAST • Rice Univ. and MSR Cambridge UK • Based on Internet-based overlay • Not traditional file system semantics • File is associated with fileID upon insertion into PAST and can have k replicas • fileID is secure hash of filename, owner’s public key, random salt # • K nodes whose nodeIDs are “closest” to msb of fileID • Instead of directory lookup, retrieve by knowing fileID

  34. Data Availability via Replication • DHash replicates each key/value pair at the nodes after it on the circle • It’s easy to find replicas • Put(k,v) to all • Get(k) from closest N5 N10 N110 K19 N20 N99 K19 N32 K19 N40 N80 N60 Slides from Shenker and Stoica, UCB

  35. First Live Successor Manages Replicas N5 N10 N110 N20 N99 Copy of 19 Block 19 N40 N50 N80 N68 N60 Slides from Shenker and Stoica, UCB

  36. Other P2P FS examples

  37. Farsite • Microsoft Research – intended to look like NTFS • Desktops on LAN (not Internet-scale) • 3 roles: client, member of directory group, file host • Directory metadata managed by Byzantine replication • File hosts store encrypted replicated file data • Directory group stores secure hash of content to validate authenticity of file • Multiple namespace tree roots with namespace certificate provided by CA • File performance by local caching under leasing system

  38. LOCKSS • Lots of Copies Keeps Stuff Safe (HPLabs, Stanford, Harvard, Intel) • Library application for L-O-N-G term archival of digital library content (deal with bit rot, obsolescence of format, malicious users). • Continuous audit and repair of replicas based on taking polls of sites with copies of content (comparing digest of content and repairing my copy if it differs from consensus). • Rate-limited and churn of voter lists to deter attackers from compromising enough copies to force a malicious “repair”.

  39. Sampled Poll • Each peer holds for every preserved Archival Unit – reference list of peers it has discovered – friends list of peers its operator knows externally – history of interactions with others (balance of contributions) • Periodically (faster than rate of storage failures) – Poller takes a sample of the peers in its reference list – Invites them to vote: send a hash of their replica • Compares votes with its local copy – Overwhelming agreement (> 70%) Sleep blissfully – Overwhelming disagreement (< 30%) Repair – Too close to call Raise an alarm • To repair, the peer gets the copy of somebody who disagreed and then reevaluates the same votes

  40. Churn of Voter Lists • Reference List – Take out voters, so that the next poll is based on different group – Replenish with some “strangers” and some “friends” • Strangers: Accepted nominees proposed by voters who agree with poll outcome • Friends: From the friends list • The measure of favoring friends is called friend bias • History – Poller owes its voters a vote (for their future polls) – Detected misbehavior penalized in victim’s history

More Related