FreeNet: A Distributed Anonymous Information Storage and Retrieval System

FreeNet: A Distributed Anonymous Information Storage and Retrieval System Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong

FreeNet • P2P network for anonymous publishing and retrieval of data • Decentralized • Nodes collaborate in storage and routing • Data centric routing • Adapts to demands • Addresses privacy & availability concerns

Motivation • Problem - Querying the network • Source - Requestor • Destination – Provider • It’s a distributed search problem • Approximating global knowledge with local knowledge • Other systems – Chord, Tapestry, Pastry • Privacy and availability • Protect authorship, prevent denial attacks

Goals of Freenet • Anonymity for producers and consumers • Deniability for information storers • Resistance to denial attacks • Efficient storing and routing • Does NOT provide • Permanent file storage • Load balancing • Anonymity for general n/w usage

Architecture • Request: • key • Hops to live • ID • Depth • Each node – local data store + routing table • Request file through location independent keys • Routing - chain of proxy requests - decision is local • Graph structure actively evolves over time

Key Based Searching • Keyword signed key(KSK) • Easy for retrieval – only need ‘D’ • Minimal protection against tampering D ‘D’– key generation Pb + Pr ; SHA(Pb) FILE + Pr E(FILE, D) Signature KSK Encrypted FILE

Keys and Searching….. • Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks • Signed Subspace Key (SSK) • Randomly generated key pair  namespace ID • SSK = SHA(‘D’) ^ SHA(Pb) • (-)Advertisement – subspace Pb + ‘D’ • (+)Owner can construct hierarchical space of arbitrary depth - using indirect files • (+)Reduces collision greatly

Keys and Searching… • Problems with SSK - updating, versioning • Content Hash Keys (CHK) • Encrypted by a random encryption key • Publish CHK + decryption key • CHK + SSK  easily updateable files • 2 step process – publish file, publish pointer • Results in pointers to newer version • Older versions accessed thru CHK • Can be used for splitting files

Retrieving Files • How do u locate the keys? • Hypertext spider • Indirect files – published with KSK of search words • Publish bookmarks • File retrieval • Request forwarded to node in RT with closest lexicographic match for the binary key • Request routing follows steepest-ascent hill climbing: first choice  failure  backtrack  second choice

c a b f e d Still Retrieving…. • Timers, hops - curtail request threads • Files cached all along the retrieval path • Self-reinforcing cycle – results in key expertise

Ring Topology • 1000 nodes in ring topology • Datastore = 50 items • RT = 250 items • Keys associated with links are hash of destn IPs

Self Reinforced Routing • Snapshots using 300 requests with hops = 500 • As network converges it drops to 6 - “six degrees of separation”

Retrieval Discussion • No controlled replication  no persistence • No correlation between keys and content • (+) Documents related to a subject are scattered • Geographical fault resilience • (-) No spatial locality – search latencies can suffer • Building indexes by other means

Publishing • Similar to retrieval but, 2 step process • Detect collisions – ‘all clear’ if no collision • Publish to node in RT with closest key match • Are CD and publish paths same? • Can result in collision during publish step • Inserts allow new nodes to advertise themselves • (+) Key-squatting is not effective

Data Management • Finite data stores - nodes resort to LRU • Routing table entries linger after data eviction • Outdated (or unpopular) docs disappear automatically • Bipartite eviction – short term policy • New files replace most recent files • Prevents established files being evicted by attacks

Network Growth • New nodes have to know one or more guys • Problem: How to consistently decide on what key the new node specializes in? • Needs to be consensus decision – else denial attacks • Advertisement  IP + H(random seed s0) • Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))……. • Key for new node = XOR of all seeds • Each node adds a RT entry for the new node

Network Growth • Key assigned to new nodes = H(IP) • Scales as log(n) until n ~ 40000 • At 40000, RTs are full

Protocol • Nodes with frequently changing IPs use ARKs • Return address specified in requests – threat? • Messages do not always terminate when hops-to-live reaches 1 • Depth is initialized by original requestor to arbitrarily small value • Request state maintained at each node – timers - LRU

Fault Resilience • Median path length < 20 at 30% node failures? • N/w becomes ineffective at 40% failures ???

Small World • Most nodes form local clusters • Few high link connecting nodes • Power law distribution provides high degree of fault tolerance

Security Concerns • Pre- routing – mesg. encrypted by public keys which determine path of pre-routing • Protecting data source – using random and probabilistic methods

Security • File integrity - KSK vulnerable to dictionary attacks • DOS attacks – Hash Cash to slow down • Attempts to displace valid files are constrained by the insert procedure

Conclusion • Provides a n/w to anonymously store and request files • Adaptive routing who’s efficiency increases with experience • Deals with privacy and data integrity in various scenarios • Applications? • Freedom of speech • Unaccountable, decentralized Napster

FreeNet: A Distributed Anonymous Information Storage and Retrieval System