110 likes | 249 Views
The Data and Storage Services Group and CASTOR. Alberto Pace. DSS group mandate. Ensure a coherent development and operation of storage services at CERN for all aspects of physics data The technologies currently used to deliver these services are CASTOR AFS TSM
E N D
The Data and Storage Services Group and CASTOR Alberto Pace
DSS group mandate • Ensure a coherent development and operation of storage services at CERN for all aspects of physics data • The technologies currently used to deliver these services are • CASTOR • AFS • TSM • We have the responsibility to constantly understand and consider alternatives to these solutions • This is a very complex cost / benefit assessment • The cost and the risk of a change are high. So must be the expected benefits
DSS organization: 3 sections • TAB – Tape Archive and Backup • Design, operate and support the archive and backup services • This includes the tape-based software back-end for CASTOR, tape robotics, drive and media for physics, infrastructure for backup and restore of file servers and databases • 7 staff members • FDO – File and Disk operations • Operate and support the storage and file system services for physics • This includes the CASTOR and AFS services • 7 staff members • DT – Design and Transition • Design and develop central storage services and their evolution. • This includes CASTOR and XROOT components as well as protocols for optimal access to physics data • 6 staff members
Castor data growth 12 million files / month Source: Miguel Marques Coelho Dos Santos
Tier-0 export Source: Miguel Marques Coelho Dos Santos
Castor Usage (Last 2 months) Disk Servers (Gbytes/s) Data written to tape (Gbytes/s) • 45K tape cartridges, 29K of which full • 26PB of data, 130 drives, 7 libraries Source: Miguel Marques Coelho Dos Santos, German Cancio Melia
Castor Role LHC Experiments ANALYSIS AREA OF CONCERN Analysis CPU Clusters Data Reprocessing End-user analysis CASTOR ASGC Tier-1s data replication BNL FNAL FZK Disk Pools IN2P3 CNAF NDGF NIKHEF PIC RAL TRIUMF tape servers
Areas of research & Development LHC Experiments ANALYSIS CASTOR ASGC Tier-1s data replication BNL Disk Pools FNAL FZK IN2P3 Managed on demand replication • Scalable • Secure • Accountable • Globally accessible • Manageable • Multiple level of services • Arbitrary availability • Arbitrary reliability • Arbitrary performance • Decoupled from HW CNAF NDGF NIKHEF PIC RAL TRIUMF tape servers Areas of R & D
Current strategy • Stability of service is required during the LHC operation • Keep Castor for what it was designed for and for what it is good at • Limit developments to consolidation. Continue improving tape reliability and efficiency for reads+writes (tape scrubbing, minimise tape recalls, developments for buffered tape marks). • We have the responsibility to constantly understand and consider alternatives • This is a very complex cost / benefit assessment • The cost and the risk of a change are high. So must be the expected benefits • Investigations (“Demonstrators”) are done independently from Castor production service
Areas of developments • In CASTOR • Consolidation in the area of Stager, Scheduler, SRM • Monitoring • Tape subsystem • improved efficiency for reads+writes, tape scrubbing, minimise tape recalls, buffered tape marks • “Demonstrator” Requirements • Scalable • Secure • Accountable • Globally accessible • Manageable • Multiple level of services • Arbitrary availability, Arbitrary reliability, Arbitrary performance • Decoupled from HW
The Castor review agenda • Presentations • The April 2010 incident (German) • Change and release management (Sebastien) • Operation, deployment and upgrade processes (Miguel) • Tape operation (Vlado) • Monitoring (Dirk) • Reviewer discussion