280 likes | 534 Views
The Tape Service at CERN. Vladimír Bahyl IT-FIO-TSI June 2009. Agenda. Environment Hardware Software Future. Environment. Nature of our data. Accumulation of independent similar events One event is much the same as any other, but no deduplication
E N D
The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009
Agenda • Environment • Hardware • Software • Future The Tape Service at CERN – 2
Environment The Tape Service at CERN –3
Nature of our data • Accumulation of independent similar events • One event is much the same as any other, but no deduplication • Loss of an event or even an entire tape is not ‘serious’ • One copy of any piece of data will normally do • Considerable reduction in costs at once! • Users also relay on the grid for data availability • Keep another copy somewhere else on the planet… • Price to pay • On a good day, 1GB file retrieval may take ~5 minutes • On a bad day, come back tomorrow… The Tape Service at CERN –4
CASTOR – high level overview ~120 million files in total Total file size: ~18 PB Average file size: ~150 MB Management software: name server, stager catalogue, request handler, tape migrator/recaller metadata metadata metadata data data ~44 000 tapes ~34 PB ~10 000 mounts/day ~900 servers ~5 PB ~10 GB/s The Tape Service at CERN –5
Access patterns • Data recording • Mostly WRITE • Well managed and understood • Postpone writing until enough to fill a tape • High transfer rates (but not yet reaching drives to the maximum) • Analysis • Mostly READ • Random access • Few files per mount accessed • Drives busy for short time • High turnaround of mounted tapes • Low transfer rates • Repack (migration to higher capacity media) • Ongoing background activity • Requires significant resources (equipment and people) • Takes years The Tape Service at CERN – 6
Connectivity Disk Servers 1 Gbps Tape Servers Tape Drives 1 Gbps 4 Gbps 1 Gbps 1 Gbps 4 Gbps 1 Gbps 1 Gbps 4 Gbps 1 Gbps 1 Gbps Fibre channel Ethernet The Tape Service at CERN –7
Hardware The Tape Service at CERN –8
Equipment – Sun • 4 Sun SL8500 libraries • In buildings 513 and 613 • Equipped with dual handbots • ~30 000 populated slots out of ~36 000 total • ~22 PB in total The Tape Service at CERN –9
Hardware changes: Sun • Tested T10000B drive (supports 1 TB cartridge) • Only 2 – “try and buy” The Tape Service at CERN –10
T10000B testing Giuseppe Lo Re – Sun T10000B tape drive evaluation–11
Hardware changes: Sun • Tested T10000B drive (supports 1 TB cartridge) • Only 2 – “try and buy” • Bought and installed 70 T10000B drives • Still keep ~20 T10000A drives – reading for repack • Removed ~40 T10000A drives • 1 TB per cartridge • Testing “Maximum Capacity” feature • Writing until physical end of media • ~5% capacity increase The Tape Service at CERN –12
Equipment – IBM 4 IBM TS3500 libraries (+ 1 for TSM) • CASTOR only in 513, TSM in both • ~15 000 populated slots out of ~35 000 total • ~12 PB in total The Tape Service at CERN –13
Hardware changes: IBM • Tested TS1130 drives(supports 1 TB cartridge) • 2 upgrades + 2 new • Tested IBM TS3500 S24 high density frame • Can contain 4 3592 cartridges in a row The Tape Service at CERN –14
S24 high density frame • Max. 4 cartridge tiers in high density slot • 1000 slots for 3592 media • D23: 400 slots for 3592 media • Capacity: 2.5 x D23 • Price: 2 x D23 • Footprint: equal to D23 • Total library storage capacity increase: from 6,200 slots to over 15,200 • LED lighting Cache T00 T01 T02 T03 T04 1 2 3 4 5 : : : : : : : : : 35 36 37 38 39 40 Testing IBM TS3500 S24 High Density frame – 15
IBM TS3500 vs. Sun SL8500 IBM configuration considered: 14 cartridge frames + 2 drive frames + 2 high availability frames 3592 media The Tape Service at CERN –16
Time to Mount test – results Drive 1 used = 1st one from the top Column initial temporary storage area = 2 yellow slots at the top. Once they are full, reshuffling takes time. There were physically more cartridges in this area in the neighboring columns. Shows difference in access to between columns Clear penalty – gripper can hold 2 cartridges at any one time, to access anything deeper takes longer Almost no penalty to move a tape from the bottom = very fast vertical movements Testing IBM TS3500 S24 High Density frame – 17
Hardware changes: IBM • Tested TS1130 drives (supports 1 TB cartridge) • 2 upgrades + 2 new • Tested IBM TS3500 S24 high density frame • Can contain 4 3592 cartridges in a row • Extended IBMLIB3 with 5 S24 HD frames • Bought and installed 46 TS1130 drives • Upgraded 44 TS1120 to TS1130 • Bought 6000 3592JB media • Looking at full size TS3500 with 14 S24 frames • Trying to use IBM drives for smaller files The Tape Service at CERN –18
Tools The Tape Service at CERN –19
Management of the hardware • Configuration Quattor managed • Developed management interface • Reused CDB SQL Web interface • Change management simplified • Re-connect a tape drive to a working tape server The Tape Service at CERN – 20
Tape Pool Management • Tape pool = logical group of tapes • CASTOR stores information in databases • Management interface • Create/Modify/Delete tape pools • Graphical overview of the state • Written in APEX The Tape Service at CERN – 21
FIX the tape drive Vendor Engineer Tape drive I/O ERROR TEST the tape drive Put the tape server back into production CLOSE the ticket Tape Operator Problems Resolution Automation • Failure occur with: drives, media, libraries • Before: • Reporting e-mail based • No tracking • High involvement from service managers • Now: • Ticket tracking system • Goal: no involvement from our side (ideal case) ! Remedy Problem Management workflow Tape Server The Tape Service at CERN – 22
Monitoring – performance Disk Layer Tape Request Queue Reading dominates Recovering; Writing has higher priority Robot failure Tape Layer The Tape Service at CERN –23
Monitoring – service The Tape Service at CERN –24
Tape Log Database Post C5 – the Future of Tape: Generally and at CERN – 25
Future The Tape Service at CERN –26
Projected growth • 2009 • Build confidence in ability to operate full size IBM robot with high density frames • 14 x S24 (cartridge frames) + 2 drive frames + 2 high availability service frames • Continue repacking existing tapes to 1 TB • No new media purchase • Until 2012 • Stay ahead of the LHC demands – capacity and rates • Expecting 15 PB/year and data rates up to Some Large Number • 2012 – next acquisition plan • New drives, new media, same libraries • 5 year contracts end; need to establish new ones The Tape Service at CERN – 27
CERN • Will stay with enterprise drives = still worth reusing media • Will continue following the technology trend, doing aggressive migrations every ~2 years • Is working on adapting architecture and usage patterns to exploit the new technology effectively • Tape will stay at CERN at least until 2020 The Tape Service at CERN – 28