1 / 46

Cache Administration

Cache Administration. Bill Robbins MetaArchive Annual Membership Meeting Houston, Texas Friday October 23, 2009. Overview of LOCKSS Cache Administration. Overview of LOCKSS Cache Administration. Perspective on a Lockss Cache vs. Other Servers Lockss Cache Installation

lis
Download Presentation

Cache Administration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Administration Bill Robbins MetaArchive Annual Membership Meeting Houston, Texas Friday October 23, 2009

  2. Overview of LOCKSS Cache Administration

  3. Overview of LOCKSS Cache Administration • Perspective on a Lockss Cache vs. Other Servers • Lockss Cache Installation • Ingesting AUs – Putting Digital Data Into Preservation • Status/Monitoring of Your Cache • Ongoing Cache Operations • Troubleshooting Caches • Central Administration • Resources & Links • Future Plans

  4. Section One Administration of a Cache vs. Other Servers

  5. Administration of a Cache vs. Other Servers • Cache = Linux + Lockss + Minor Add-ons • NOT a multi-purpose server • Preserve --> MUST BE Secure!! • NOT online retrievable • NOT a backup site • NOT mission critical • Lockss - Turtle - Cheap H/W, 2nd Tier vendors - Iron systems, Capricorn, etc

  6. Administration of a Cache vs. Other Servers • How does the purpose of the server change administration policies? (Users, Backups, monitoring…) • Selecting the O/S (No fee CentOS) • O/S upgrades (Security, LOCKSS & JDK) • Lockss caches file system (RAID?) • Appliance - Very Little Classic UNIX administration • Repair contracts – lowest level Questions / Discussion

  7. Section Two Lockss Cache Initial Installation

  8. Lockss Cache – Initial Installation • Three Sequential Steps • Creating the Linux Server • Turn the Linux Server into a MetaArchive Node • Installation & Configuration of Lockss Software • Follow the Kickstart Instructions • Often servers will use Kickstart • Otherwise, follow latest Kickstart instructions

  9. Lockss Cache – Initial Installation 1) Creating the Linux Server • O/S – CentOS 5.3 • Hard Disks – file system (Multi TB + for AUs) • Kickstart-Complete Automated Installation is Possible • Network Configuration – CRITICAL INFORMATION • Hostname FQDN – Fully Qualified Domain Name • IP Must be registered, fixed IP address • Netmask • Gateway • DNS Local Nameserver

  10. Lockss Cache – Initial Installation • Security Configuration • IPTABLES - Internal Allow List • IPTABLES is not a Firewall in classic sense • Lockss - V3 Protocol (TCP:9729) • User Interface via http - Needed for Local AND Central Administration (TCP:8081) • Audit Proxy (TCP:8080) • Standard Linux – ssh (TCP:22) • SE Linux – Security Enhanced Linux is enabled.

  11. Lockss Cache – Initial Installation • Software Packages Selection • Development software if needed in the future • Office Software, just in case • Not a Central Server, Not a shared resource • NO Email • NO FTP • NO Windows Share • NO games • The node is now a Basic Linux Server Questions / Discussion

  12. Lockss Cache – Initial Installation 2) Creating the MetaArchive Node • Post Install of the Kickstart == One Shell Script • Packages at http://~~.metaarchive.org/~~/kickstart (RPM) JDK, Lockss & Denyhosts • RPM – Redhat Package Manager • Package Retrieve and install • WGET • rpm install • In some cases we get a Config File (.conf) • Set the software to run on normal startup, (run levels) • Future – Software Repository Questions / Discussion

  13. LOCKSS Cache – Initial Installation 3) Lockss Installation & Configuration • Create the lockss user and directories • Need parameters to create lockss startup • SMTP server • Admin Email • User ID & Roles -- Known User ID & Password are needed for the Cache Manager and Troubleshooting • The LOCKSS “hostconfig” is set up • Local Administrators can set up a UID/Pass • LOCKSS Will Start on Reboot • CRITICAL INPUT – /etc/lockss/hostconfig

  14. /etc/lockss/hostconfigCritical Parts [Fully qualified hostname (FQDN) of this machine: [devcache1.library.emory.edu] IP address of this machine: [170.140.208.43] Path to java: [/usr/java/jdk1.5.0_15/bin/java] Configuration URL: [http://some-path/config/lockss.xml] Preservation group(s): [metaarchive] User name for web UI administration: [denisg] Password for web UI administration user denisg: [] Password for web UI administration (again): [] -------- Done---- lines removed LOCKSS will start automatically at next reboot, or you may start it now by running /etc/init.d/lockss start

  15. Instructions for Configuring and Running LOCKSS From /etc/lockss/README Public (routable) IP address -nslookup must work- To further administer the daemon, go to http://<hostname>:8081/ Configuration URL – THIS IS THE TITLE DATABASE The URL (or local file name) from which the LOCKSS daemon will load extended configuration and tuning parameters. Use the default value to participate in the global LOCKSS preservation community. Use a private config file to create your own preservation community. MetaArchive is a PRIVATE Lockss Network (PLN).

  16. Instructions for Configuring and Running LOCKSS - Preservation group(s) Used to select group-specific options in the config file. Use the default value to participate in the global LOCKSS preservation community. Multiple groups may be entered, separated by semicolon. - User name/Password for web UI administration • Follow Kickstart Instructions and then ….. • Your Cache is Now Ready to Go! Questions / Discussion

  17. Section Three Lockss CacheIngesting Archival Units

  18. Lockss CacheIngesting Archival Units The initial ingestion of an Archival Unit is a short and simple process. But it is only part of the ongoing process needed for long term digital data preservation. Verifying Preservation requires several ongoing activities both on the caches AND on the sites that are preserved. After the initial ingestion there will be ongoing preservation operations. • Ingesting Data - CAUTION • CAUTION – IS THE SITE PROTECTED?? IP ALLOW LIST AVAILABLE! • CAUTION – I S THIS CACHE SUPPOSED TO INGEST THIS DATA??

  19. Lockss CacheIngesting Archival Units Pre-Ingestion Checklist / Review • Conspectus Entry • Provider Site is prepared • Plugin is completed • Manifest Page is in place • The Site is accessible to the Caches • The Plugin is accessible to the Caches • The AU is registered in the Title Database • The Cooperative has been notified

  20. Lockss CacheIngesting Archival Units • CAUTION – IS THE SITE PROTECTED?? • Access to UI (User Interface) • Browser http://cacheName.university.edu:8081 • Access Control via IP Address • Ingesting • Journal Configuration • Add Titles • No wholesale ingesting – Please! Questions / Discussion

  21. Section Four Lockss Cache Preservation Operations

  22. Full Cycle of Preservation

  23. Lockss Cache Preservation Operations • How Do You Know The AU is accurate and safe? • Crawl / Poll / Vote / Repair (if needed) • Repeat • Status Monitoring & Verification • Using the Daemon • Using The Cache Manager • Audit Proxy

  24. Lockss Cache Preservation Operations • What is that Daemon Doing? - Live Demo (Maybe – due to security) • Crawling (Ingesting) • Polling • Voting • Establish Peers / Become a Peer • Auditing / Verifying Contents • Restoral • Logging • The LCAP protocol contains most of these functions

  25. Lockss Cache Preservation Operations The MetaArchive Cache Manager • Centralized Network Data Gathering • Caches Collections Archival Units Disk Space • What is in my cache? What do the partners have? • Where are AUs replicated? What are the sizes? • Daily Snapshots are taken. Data is not live. • “Problems” are flagged. • Ruby On Rails Technology • Almost Exclusive to MetaArchive

  26. Lockss Cache Preservation Operations When is an error not a problem? The entire cooperative network is always very full of activity. Glitches, short term outages, planned down time, etc. are expected. Any error should be present for more than a day or two before it is worth pursuing as a problem. Will review longstanding errors on the weekly call

  27. The MetaArchive Cache Manager- Live Demo

  28. Audit Proxy How Can You See What is Preserved? This is the function of the Audit Proxy. This can be useful as well during the testing phase of the plugin. • Settings on the UI for access • Settings on the UI for the proxy • Settings on Your Browser

  29. Audit Proxy Instructions are on the Wiki It makes a difference whether Firefox or I.E. is your browser Questions / Discussions

  30. Section Five Lockss Cache Ongoing Operations

  31. MA Ongoing Operations Changes to the Site Content Changes to MetaArchive Network Lockss Cache Ongoing Operations Server Ongoing Operations • Backup, Restore • Monitor for Problems • Typical Server Admin Issues • SOME VERY SPECIAL CONSIDERATIONS ARE • NEEDED IN ONGOING CACHE OPERATIONS! • THERE ARE TWO SIDES TO ONGOING OPERATIONS!!!

  32. Lockss Cache Ongoing Operations Ongoing Operations of the Server • Journal Configuration files are sent to central backup • File system – monitored someby the Cache Manager, but not completely

  33. Lockss Cache Ongoing Operations Changes to a Preservation Site Such As … • Architecture, Rate of new content Mean you might need … • New plugin, Change the Conspectus • New AU, New Manifest Page • Disable / Remove AU

  34. Lockss Cache Ongoing Operations Changes to the Cooperative Network New Servers Online More Space Available Mean Changes to Firewall Allow Lists AU Distributions Questions / Discussions

  35. Section Six Lockss Cache Troubleshooting

  36. Lockss Cache Troubleshooting Common Problems not related to ingesting Disk Space Issues The pollstate directory Needing to remove an AU Problems with ingesting the AUs are usually non-existent provided the precautions already discussed have been taken.

  37. Lockss Cache Troubleshooting Tools Provided by the Daemon • Logging from the Daemon • var/log/lockss/stdout, daemon - avail via UI • Debug Panel • http://your-cache.univ.edu:8081/DebugPanel • Force re-crawl of site or plugin • Force Poll • AU Troubleshooting • AU removal • AU restore

  38. Lockss Cache Troubleshooting Tools From Unix Shell Restarting Lockss Daemon /etc/init.d/lockss {stop | start} Standard Unix Tools Disk space issues Other bottlenecks, CPU, memory, network

  39. Completely Removing an Archival Unit See the Wiki for more on Removing an AU • This is not a simple problem • A LOCKSS cache will NOT delete data • Data must be deleted from Unix

  40. Restoring an Archival Unit to a Cache The Daemon will restore lost or corrupt data to a cache from the originating site, if possible. If not possible to reach the originating site, the Daemon will restore data from a Peer Cache. If the Cache is having a really bad day, you may need to restore the Journal Configuration, and then allow the site to be re-crawled. Questions / Discussion

  41. Section Seven Central Administration & Resources for the MetaArchive Cooperative

  42. Resources & Links LINKS • Use this RSS Link to keep up to date!! http://metaarchive.org/public/resources/rss_feeds/metaarchive_links.rss • The MetaWiki == Resource Central • https://metaarchive.org/metawiki/ • Login Needed? Request it! Email: brobbi2@emory.edu • The IP allow list • Many Others on the RSS Feed • The Cache Manager The Conspectus • The Lockss Wiki Mailing Lists

  43. Resources & Links LINKS • The Subversion System (SVN) • Version Control / Software Releases • NOTE: Two Systems – One for plugins one for code. • All plugins need to be stored in SVN • Clear path to develop, test, sign & jar plugins from one common source • Papers on Lockss Technology • LOCKSS itself • LCAP protocol Questions / Comments

  44. Coming Sooner or Later New Members

  45. Coming Sooner or Later • Encryption System for Mailing files. • Test Network – EZ Access • Software Repository • Use YUM for automated updates • Increase Pro-Active Monitoring • Cache management topics, “How To” online videos Questions / Comments

  46. Summary • Mature Technology, but still a lot to do • Growth in Members & Data being preserved leads to better understanding of how we need to operate the network • Contacts • Bill Robbins Monika Mevenkamp • bill.robbins@metaarchive.org momeven@gmail.com • (404) 712-2851

More Related