210 likes | 330 Views
Overview of LOCKSS. Session Learning Objectives . Provide an overview of the LOCKSS architecture. Describe the LOCKSS polling process Describe how LOCKSS private networks differ. Provide a vocabulary of technical terms used frequently with LOCKSS networks. Architectural Components.
E N D
Session Learning Objectives • Provide an overview of the LOCKSS architecture. • Describe the LOCKSS polling process • Describe how LOCKSS private networks differ. • Provide a vocabulary of technical terms used frequently with LOCKSS networks
Architectural Components • Provider Sites (digital collections) • LOCKSS nodes (aka “peers”) • Plugins / Plugin Repository • Cache Manager • Title Database / Conspectus Database
Provider Sites • Prepare a digital collection so that it is web accessible to the preservation nodes • Expose a “manifest” web page for each collection, according to LOCKSS specifications. • Grants permission for LOCKSS to crawl • Gives starting point for crawl • Provide information sufficient to create a LOCKSS plugin for the collection (or else create the plugin themselves and reposit that plugin with the LOCKSS network)
LOCKSS Peer Nodes • Data caches for harvested content • Caches organized into archival units (AUs) • Nodes can select which AUs to crawl and preserve • There must be >= 6 copies of an AU in order for the polling process to work properly
Plugins / Plugin Repository • Tell LOCKSS where, how and how often to crawl a provider site for AUs • Plugins are Java based • Distinct from core LOCKSS software
Cache Manager • Distributed separately from LOCKSS • Can remotely inspect and manage the caches on the various peer nodes
Title / Conspectus Databases • Title database on each node describes and manages which AUs to preserve on that node • Conspectus Database designed for MetaArchive Project, provides more extensive metadata about the preserved digital collections, and feeds the Title database with entries
Plugin Repository DC1 Digital Collection 1 Private LOCKSS Network Nodes 1 DC1 AU 1 DC2 DC2 2 DC2 Web Site 3 Manifest page DC1 AU 2 4 DC1 DC2 5 DC2 Digital Collection 2 AU 1 AU 2 6 Web Site DC1 Source Code 7 DC1 DC2 DC1 8 AU 3 DC2 Manifest page SQL Dump 9 DC2
Invited nodes create fresh SHA1 digest of the AU Polling Process resulting in “landslide loss”, AU repair Poll Effort Proof is cryptographically derived and sent to affirmative voter’s challenges Affirmative PollChallenge message responses allow that inner circle node to participate in poll DC2-AU1 DC2-AU1 2 4 SHA1 SHA1 There is a “landslide” of valid, disagreeing votes against the Node 5’s SHA1 digest of DC2-AU1 Invitation Valid vote disagrees Valid vote disagrees Node 5 calls poll on AU 1 of Digital Collection 2 PollChallenge PollProof 1 Once repair is completed, Node 5 immediately calls a new poll, which effectively verifies, or invalidates and corrects, the repair DC2-AU1 Valid vote disagrees 5 SHA1 Encrypted RepairRequest message Repair made DC2-AU1 SHA1 Valid vote agrees Node 9 nominates 7 and 8 Node 5 discovers new peers through nomination process Node 5 invites some recently encountered peers to vote. (Each node maintains a reference list of the recently encountered peers) Those invited are the “inner circle” for this opinion poll. DC2-AU1 9 Since agreeing votes are below threshold, Node 5 picks a random disagreeing voter from the inner circle SHA1 DC2-AU1 8 DC2-AU1 7 Nominated Nodes 7 and 8 belong to the “outer circle”, can be invited to subsequent voting rounds by Node 5
Polling Refresh Timer • A peer sets a refresh timer for a given AU to determine the interval between successive polls • System parameter R is the mean for the possible random values generated for the refresh timer
System Parameter – ‘Quorum’ • Q = # of valid inner circle votes required to conclude a poll successfully • Q = 6 is the thoroughly tested value in use • If votes < Q, poller invites additional peers, or else aborts the opinion poll
Polling Outcome – ‘Landslide Win’ • The poller considers its current copy to have integrity • This is the only scenario in which an opinion poll concludes successfully • The poller updates its reference list and then waits until the next polling period (determined by the refresh timer)
Reference List Update • Happens only after a successful poll • Poller removes the inner circle peers who had valid votes in the last opinion poll • Culls peers it has not been able to contact for some time • Adds outer circle peers whose votes were valid and eventually agreeing
Polling Outcome - Inconclusive • D = max allowed “minority” votes • If Agreeing Votes > D, and • Agreeing Votes < Total valid votes – D, • Then the poll is inconclusive, raises alarm • Human intervention needed to determine if nodes have been compromised • Peers voting in agreement with a known bad copy are blacklisted if that peer node can’t be identified or it won’t cooperate
Further Details on Polling Process • Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, and Yanto Muliadi, "LOCKSS: A Peer-to-Peer Digital Preservation System", ACM Transactions on Computer Systems (TOCS). http://www.eecs.harvard.edu/~mema/publications/TOCS2005.pdf • See also LOCKSS related publications at http://www.lockss.org/lockss/Publications
The LOCKSS Private Network Difference • More flexible (not appliance based) • Can run on any operating system that supports Java • LOCKSS Team maintains rpm packages for Linux installations • Peer Node administrators have greater discretion configuring access, customizing functionality, e.g. altering system parameters
The LOCKSS Private Network Difference (cont.) • Can extend LOCKSS core functionality with supplemental tools and methods to fit new use cases • E.g. the MetaArchive Conspectus database
Vocabulary • (Please refer to the workshop binder for terminology and definitions)