140 likes | 295 Views
Repack at CERN Usage and Outlook Tim Bell Gordon Lee. Agenda. Use Cases Bulk Repack Outlook. Use #1 : Data Recovery. When a media error is reported by a tape, arrange the repack to recover as many files as possible onto fresh media. Volumes of around 5 per week
E N D
Repack at CERN Usage and Outlook Tim Bell Gordon Lee
Agenda • Use Cases • Bulk Repack Outlook 2
Use #1 : Data Recovery • When a media error is reported by a tape, arrange the repack to recover as many files as possible onto fresh media. • Volumes of around 5 per week • Run by the tape operator with a small script to perform tape drive reservation and logging of the history • Reliability improved substantially with 2.1.6 and 2.1.7. More fixes in the pipeline for dual copy tapes along with enhancements for output tape pool management 3
Use #2 : Defragmentation • Recovery of space from tapes when users have deleted files from Castor. Files cannot be deleted from tape so all remaining files copied off and the tape reclaimed • Looking to automate the detection and copy of badly fragmented tapes • Being run manually at the moment based on comparing number of segments on the tape and count of files in the name server • Around 20-50 tapes per week expected • Works well 4
Use #3 : Bulk Media Change • Higher density tapes becoming available which require copy of old tapes to new • Limited time and resources since need to free up robot slots and not use too many drives • Requires data rates of 600+ MBytes/s for CERN’s data volumes (20PB) to complete within a year • Between 16 (at 80MBytes/s) and 50 drives (25MBytes/s) • Equivalent to a large LHC experiment • Legacy of small files to preserve • Average file size is 142MB • 9 seconds overhead per file on T10K drives • 100 million files to repack • Up to 154,000 files per tape 5
Approach to get to 80MB/s • Scale repack2 to meet the performance targets • Tuning of read/write policies to improve disk server I/O rates • Recall order sorting for large files per tape • Investigate more direct copies such as tape-to-tape • Reduce stager overheads • Avoid tape server ethernet bottlenecks • Review performance for tape marks • Unlabelled tapes ? Embedded labels ? 7
Operational issues • Development underway for • Mapping between service class repack and tape pool where the tape data should be sent. • Enhanced error reporting on failures • Dual copy files • Tickets opened for • Repacking disabled tapes • Skipping over bad files without unmounting • Error analysis often requires developer • Root causes of failed or blocked repacks not easy to find for tape operations 8
Summary • Castor 2.1.7 repack provides a solution for data recovery and de-fragmentation and is used in production at CERN • Bulk repacking performance is a major concern which requires a solution before year-end 2008 when new drives are expected to be ready 9
Tape-to-Tape repack? CERN repack Stager Disk Server • Tape-to-tape copy rather than copying through the stager avoids network bottleneck • Initial tests indicate that the tape writing overheads are larger for our typical files 11
Tests to scale repack 2 • 3 disk servers • 3 tape drives in • 3 tape drives out • File size of 2GB+ • 6 disk servers • 3 tape drives in • 3 tape drives out • File size of 500MB+ 12
Tapes from the Dark Side • 9940s with 20000+ files • Use IBM NVC tapes for small file handling • Can take up to weeks to repack due to label overhead 13