230 likes | 242 Views
The Atlas Petabyte Datastore. A grid enabled, networked data storage system: CrystalGrid Workshop 15 th Sept 2004 David Corney. d.r.corney@rl.ac.uk. Data Store Overview. General purpose, multi user, data archive. In use over 20 years. Four major upgrades.
E N D
The Atlas Petabyte Datastore A grid enabled, networked data storage system: CrystalGrid Workshop 15th Sept 2004 David Corney. d.r.corney@rl.ac.uk
Data Store Overview • General purpose, multi user, data archive. • In use over 20 years. Four major upgrades. • Current capacity 1PB – largest (non dedicated) multi user system in UK academia? • Grid Interfaces: • SE (Storage Element) – will be SRM compliant • SRB interface (Storage Resource Broker)
STK 9310 “Powder Horn” 9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B A A A A A A A A Switch_1 1 2 3 4 Switch_2 5 6 7 8 11 12 13 14 15 11 12 13 14 15 RS6000 fsc0 fsc1 fsc0 RS6000 fsc1 fsc0 RS6000 fsc1 fsc0 RS6000 fsc1 rmt1 rmt5-8 rmt2 rmt5-8 rmt3 rmt5-8 rmt4 rmt5-8 1.2TB 1.2TB 1.2TB 1.2TB Gbit network
catalogue data user administrators SE Atlas Datastore Architecture Robot Server (buxton) Catalogue Server (brian) Copy C Copy A Copy B flfsys tape commands (sysreq) CSI ACSLS recycling (+libflf) read ACSLS API flfqryoff (copy of flfsys code) flfsys import/export commands (sysreq) Backup catalogue flfdoback (+libflf) read Tape Robot control info (mount/ dismount) read stats flfdoexp (+libflf) LMU cellmgr backend Pathtape Server (rusty) IBM tape drive STK tape drive flfsys (+libflf) pathtape data short name (sysreq) data servesys flfsys farm commands (sysreq) frontend long name (sysreq) flfsys admin commands (sysreq) SSI flfstk ? flfsys user commands (sysreq) (sysreq) (sysreq) flfaio cache disk flfscan flfaio Farm Server user program datastore (script) flfaio vtp tapeserv I/E Server (dylan) data transfer (libvtp) vtp tape importexport 28 Feb 03 - 2 B Strong libvtp User Node
Strategy - • De-couple user and application from storage media. • Upgrades and media migration occur “behind the scenes” • High resilience - very few Single Point Failures • High reliability high, availability (99.9986% in 2003) • Constant environmental monitoring linked to alarm/call out • Lifetime data integrity checks hardware and software • Fire safe and off-site backups; Tested disaster recovery procedures; media migration, recycling • Easy to exploit (endless) new technology • Technology watch to monitor future technology path
Robot History • M860 • 110GB • STK 4400 • 1.2Tbytes • IBM 3494 • 30Tbytes • STK 9310 • 1Pbyte
STK 9310 “Powderhorn” with 6000 slots (1.2Pbytes) 4 IBM 3590 B drives now phased out 10 Gbyte native 10 Mbyte/s transfer 8 New STK 9940B drives 200 Gbyte native 30Mbytes/sec/drive transfer 240Mbyte/sec theoretical maximum bandwidth 4 RS6000 Data servers (+ 4 “others”) 1Gbit networking (Expected to become 10Gbit by 2005) Data Migration to new media completed ~ Feb 2004 Hardware upgrade - completed Jun 2003
Users • Particle Physics Community (LHC: CMS, Atlas, LHcb,….) • ISIS, British Atmospheric Data Centre • EISCAT (Radar research) • National Earth Observation Data Centre • World Data Centre, BITD • Central Laser Facility • Diamond… • National Crystallography Service, Southampton University, • WASP, VIRGO Consortium • Integrative Biology, • Others…
Interfaces • “Light weight” interfaces: • Client server configuration: • “tape” command for many platforms • Virtual Tape Protocol (VTP) • Fortran and C callable library • “Heavy weight” interfaces: • SRB interface • SE interface developed for EDG/GRIDPP/GRIDPP2…
SRB MCAT Database SRB MCAT Server SRB ADS Server Atlas Data Store SRB Disk Server (Local Server) SRB Client SRB-ADS architecture SRB ADS Server Port 5600 SRB-ISIS server instance Port 5601 SRB-BADC server instance Port 5602 SRB-CCLRC server instance
Adding Interfaces for ADS VTP Interface SRB Interface SRB01 Server ADS Farm Server SRB Users SRB02 Server ADS Farm Server ADS Central Catalog Server flfsys SE Interface ADS Farm Server LCG Servers SE Users ADS Farm Server VTP Users (tape)
Logical Resource for Containers ADS-logical-resource ADS-cache resource ADS-tape resource Ssyncont Copies container from cache to tape Sput –c <container-name> <data-file>
The Storage Element (SE) • A component of European Data Grid (EDG/EGEE) middleware developed by CCLRC’s e-Science & PPD departments • Uniform Grid Interface which enables a standard protocol for mass data transfer across the grid, between the many diverse Mass Storage Systems, including: • Atlas Petabyte Data Store • CASTOR • ENSTOR • HPSS • Others…
SE Deployment CERN – Castor and disk UAB Barcelona – Castor RAL – Atlas DataStore and Disk ESA/ESRIN – disk CC-IN2P3 – HPSS INFN / CNAF – disk FZK Karlsruhe – disk
Digital Curation Centre • Joint collaboration between CCLRC, UKOLN, and Edinburgh and Glasgow Universities. • Provide advice, support, research and Development into aspects of Digital Curation for the UK HE community • Funded jointly by JISC and EPSRC - £1m/year for three years initially. Feb 2004- 2007 • Establish collaboration with industrial partners…
Objectives • Vibrant research programme • addressing the wider issues of digital curation • Collaborative Associates Network of Data Organisations • strong links across existing community of practice • engagement with curators (individuals & organisations) • Services • to evaluate tools, methods, standards and policies • a repository of tools and technical information • ‘Virtuous circle’ • expertise, experience & requirement feed into the DCC research programme
management & governance JISC & Research Councils curation organisations e.g. DPC users: communities of practice Management Board Advisory Group Service Operations Group UKOLN(Bath) Steering & Policy Committee Collaborative Associates Network of Data Organisations NDCC/NeSC focus & physical presence U. of Glasgow U. of Edinburgh Research Co-ordination Committee CCLRC research collaborators Industry standards bodies Digital Curation Centre - Organisation
CCLRC’s role within the DCC • Standards watch • Standards definition and publication • Tools watch • Tools selection and certification • Registry of metadata standards • Metadata research
DCC role in Certification • DCC will help to create • Standards against which to perform audit and certification • OAIS Reference Model and follow-on work • Processes for accreditation and certification • Work in Digital Repository Certification Task Force • Organisation(s) to perform accreditation and certification
ADS SRB Interface for CMS File System Oracle Database Cluster SRB MCAT server CMS node External to RAL SRB Client SRB File Server ADS Server Farm ADS Tape Robot System CSF disk server At RAL SRB ADS Server SRB Client SRB File Server ADS Pathtape server File System SRB Client, External