NLM Digital Repository Server Architecture

NLM Digital RepositoryServer Architecture January 18, 2011

Design Considerations • Consistency with NLM architecture and processes • Remove single points of failure • Data redundancy for preservation • Availability • Scalability • Ingest ease, speed

Single Server Architecture File Server Application Server Database Server NWU BookViewer Flash Video Player with Search Fedora Managed Storage Muradora 1.4b MySQL 5.0 Resource Index Fedora 3.2.1 Djatoka Solr GSearch Solr Index Tomcat External Storage OS: CentOS HW: virtual server, 3 CPU, 24 GB RAM

Content and code Fedora managed content Fedora database Fedora Resource Index Solr Index External content Application code Can and should these items be shared across Fedora servers?

Data Center Environment • Two locations with two virtual servers each • Primary: NLM data center • Backup: Contingency operations data center • Active/Active – both locations always in use • Each virtual server has 3 CPU, 24 GB RAM • System tools • 3DNS – wide load-balancing • BIG-IP – local load balancing • Server monitoring, automatic failover • SnapMirror – NetAppfilesystem replication

System Architecture Browser Browser Primary Data Center Backup Data Center BIG-IP 3DNS BIG-IP Fedora Primary #1 Fedora Primary #2 Fedora Backup #1 Fedora Backup #2 Managed Storage Managed Storage Managed Storage Managed Storage Solr Index Resource Index Solr Index Resource Index Solr Index Resource Index Solr Index Resource Index Fedora DB Fedora DB Fedora DB Fedora DB External Storage External Storage

Ingest considerations Our Fedora system is read-only with controlled periodic batch content updates System is available during updates – use one data center while updating the other Code and content should be identical across servers Reduce time to ingest to all servers in system. Approx. 10 hours for full re-ingest.

Content replication • Content replication strategies • Fedora journaling (ingest to master, master-slave, messaging) • Ingest to master, copy managed content to slave, rebuild slave DB and resource index from managed content (rebuild is faster than full ingest) • Ingest to master, use system tools (NetAppSnapMirror) to copy all resources to slaves. • Ingest to each server independently • Our approach • Turn off primary data center, use backup data center to serve public • Ingest to primary 1, copy managed content to primary 2, rebuild primary 2 ... • Turn off backup data center, use primary data center to serve public • Use SnapMirror to copy all resources from primary 1,2 to backup 1,2 • Turn on backup data center, both data centers available to serve public

NLM Content Replication Primary Data Center Backup Data Center Ingest Fedora Primary #1 Fedora Primary #2 Fedora Backup #1 Fedora Backup #2 SnapMirror Rebuild Managed Storage Managed Storage Managed Storage Managed Storage Solr Index Resource Index Solr Index Resource Index Solr Index Resource Index Solr Index Resource Index Fedora DB Fedora DB Fedora DB Fedora DB External Storage External Storage

NLM Digital Repository Server Architecture

NLM Digital Repository Server Architecture

Presentation Transcript

Server Architecture

Digital Repository Preservation Service ________________________

Digital Repository Service Update ___________________________

Digital Repository Preservation Service ________________________

NLM Digital Programs

The Dryad Digital Repository

Digital Repository

Server Architecture

Stanford Digital Repository

Harvard’s Digital Repository Service (DRS) Architecture

National Digital Learning Repository

NLM

Building a Digital Repository

The Flexible Extensible Digital Object Repository Architecture

Digital Repository Service ___________________________

Flexible and Extensible Digital Object and Repository Architecture (FEDORA)

Flexible and Extensible Digital Object and Repository Architecture (FEDORA)

Digital Server