230 likes | 413 Views
FreeNet: A Distributed Anonymous Information Storage and Retrieval System. Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong. FreeNet. P2P network for anonymous publishing and retrieval of data Decentralized Nodes collaborate in storage and routing Data centric routing
E N D
FreeNet: A Distributed Anonymous Information Storage and Retrieval System Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong
FreeNet • P2P network for anonymous publishing and retrieval of data • Decentralized • Nodes collaborate in storage and routing • Data centric routing • Adapts to demands • Addresses privacy & availability concerns
Motivation • Problem - Querying the network • Source - Requestor • Destination – Provider • It’s a distributed search problem • Approximating global knowledge with local knowledge • Other systems – Chord, Tapestry, Pastry • Privacy and availability • Protect authorship, prevent denial attacks
Goals of Freenet • Anonymity for producers and consumers • Deniability for information storers • Resistance to denial attacks • Efficient storing and routing • Does NOT provide • Permanent file storage • Load balancing • Anonymity for general n/w usage
Architecture • Request: • key • Hops to live • ID • Depth • Each node – local data store + routing table • Request file through location independent keys • Routing - chain of proxy requests - decision is local • Graph structure actively evolves over time
Key Based Searching • Keyword signed key(KSK) • Easy for retrieval – only need ‘D’ • Minimal protection against tampering D ‘D’– key generation Pb + Pr ; SHA(Pb) FILE + Pr E(FILE, D) Signature KSK Encrypted FILE
Keys and Searching….. • Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks • Signed Subspace Key (SSK) • Randomly generated key pair namespace ID • SSK = SHA(‘D’) ^ SHA(Pb) • (-)Advertisement – subspace Pb + ‘D’ • (+)Owner can construct hierarchical space of arbitrary depth - using indirect files • (+)Reduces collision greatly
Keys and Searching… • Problems with SSK - updating, versioning • Content Hash Keys (CHK) • Encrypted by a random encryption key • Publish CHK + decryption key • CHK + SSK easily updateable files • 2 step process – publish file, publish pointer • Results in pointers to newer version • Older versions accessed thru CHK • Can be used for splitting files
Retrieving Files • How do u locate the keys? • Hypertext spider • Indirect files – published with KSK of search words • Publish bookmarks • File retrieval • Request forwarded to node in RT with closest lexicographic match for the binary key • Request routing follows steepest-ascent hill climbing: first choice failure backtrack second choice
c a b f e d Still Retrieving…. • Timers, hops - curtail request threads • Files cached all along the retrieval path • Self-reinforcing cycle – results in key expertise
Ring Topology • 1000 nodes in ring topology • Datastore = 50 items • RT = 250 items • Keys associated with links are hash of destn IPs
Self Reinforced Routing • Snapshots using 300 requests with hops = 500 • As network converges it drops to 6 - “six degrees of separation”
Retrieval Discussion • No controlled replication no persistence • No correlation between keys and content • (+) Documents related to a subject are scattered • Geographical fault resilience • (-) No spatial locality – search latencies can suffer • Building indexes by other means
Publishing • Similar to retrieval but, 2 step process • Detect collisions – ‘all clear’ if no collision • Publish to node in RT with closest key match • Are CD and publish paths same? • Can result in collision during publish step • Inserts allow new nodes to advertise themselves • (+) Key-squatting is not effective
Data Management • Finite data stores - nodes resort to LRU • Routing table entries linger after data eviction • Outdated (or unpopular) docs disappear automatically • Bipartite eviction – short term policy • New files replace most recent files • Prevents established files being evicted by attacks
Network Growth • New nodes have to know one or more guys • Problem: How to consistently decide on what key the new node specializes in? • Needs to be consensus decision – else denial attacks • Advertisement IP + H(random seed s0) • Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))……. • Key for new node = XOR of all seeds • Each node adds a RT entry for the new node
Network Growth • Key assigned to new nodes = H(IP) • Scales as log(n) until n ~ 40000 • At 40000, RTs are full
Protocol • Nodes with frequently changing IPs use ARKs • Return address specified in requests – threat? • Messages do not always terminate when hops-to-live reaches 1 • Depth is initialized by original requestor to arbitrarily small value • Request state maintained at each node – timers - LRU
Fault Resilience • Median path length < 20 at 30% node failures? • N/w becomes ineffective at 40% failures ???
Small World • Most nodes form local clusters • Few high link connecting nodes • Power law distribution provides high degree of fault tolerance
Security Concerns • Pre- routing – mesg. encrypted by public keys which determine path of pre-routing • Protecting data source – using random and probabilistic methods
Security • File integrity - KSK vulnerable to dictionary attacks • DOS attacks – Hash Cash to slow down • Attempts to displace valid files are constrained by the insert procedure
Conclusion • Provides a n/w to anonymously store and request files • Adaptive routing who’s efficiency increases with experience • Deals with privacy and data integrity in various scenarios • Applications? • Freedom of speech • Unaccountable, decentralized Napster