300 likes | 422 Views
The GSI Mass Storage for Experiment Data. DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer, GSI Darmstadt H.Goeringer@gsi.de. Overview. different views current status last enhancements: - write cache - on-line connection to DAQ future plans conclusions.
E N D
The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer, GSI Darmstadt H.Goeringer@gsi.de
Overview • different views • current status • last enhancements: - write cache - on-line connection to DAQ • future plans • conclusions GSI DVEE Palaver 15.2.2005
GSI Mass Storage System Gsi mass STORagE system gstore GSI DVEE Palaver 15.2.2005
gstore: storage view GSI DVEE Palaver 15.2.2005
gstore: hardware view 3 automatic tape libraries (ATL): (1) IBM 3494 (AIX) 8 tape drives IBM 3590 (14 MByte/s) ca. 2300 volumes (47 TByte, 13 TByte backup) 1 data mover (adsmsv1) access via adsmcli, RFIO read read cache1.1 TByte StagePool, RetrievePool GSI DVEE Palaver 15.2.2005
gstore: hardware view (2) StorageTek L700 (Windows 2000) 8 tape drives LTO2 ULTRIUM (35 MByte/s) ca 170 volumes (32 TByte) 8 data mover (gsidmxx), connected via SAN access via tsmcli, RFIO read cache 2.5 TByte StagePool, RetrievePool write cache ArchivePool: 0.28 TByte DAQPool: 0.28 TByte GSI DVEE Palaver 15.2.2005
gstore: hardware view (3) StorageTek L700 (Windows 2000) 4 tape drives LTO1 ULTRIUM (15 MByte/s) ca. 80 volumes (10 TByte): backup copy of 'irrecoverable' archives ...raw mainly for backup of user data (~ 30 TByte) GSI DVEE Palaver 15.2.2005
gstore: software view 2 major components: • TSM (Tivoli Storage Manager) commercial handles tape drives and robots data base • GSI software (~ 80,000 lines of code) C, sockets, threads - interface to user (tsmcli / adsmcli, RFIO) - interface to TSM (TSM API client) - cache administration GSI DVEE Palaver 15.2.2005
gstore user view: tsmcli tsmcli subcommands: archive file* archive path retrieve file* archive path query file* archive path* stage file* archive path delete file archive path ws_query file* archive path pool_query pool* *: any combination of wildcard characters (*,?) allowed soon: file may contain list of files (with wildcard chars) GSI DVEE Palaver 15.2.2005
gstore user view: RFIO rfio_[f]open rfio_[f]read rfio_[f]write rfio_[f]close rfio_[f]stat rfio_lseek GSI extensions (for on-line DAQ connection): rfio_[f]endfile rfio_[f]newfile GSI DVEE Palaver 15.2.2005
gstore server view: query GSI DVEE Palaver 15.2.2005
gstore server view: archive to cache GSI DVEE Palaver 15.2.2005
gstore server view: archive from cache GSI DVEE Palaver 15.2.2005
gstore server view: retrieve from tape GSI DVEE Palaver 15.2.2005
server view: retrieve from write cache GSI DVEE Palaver 15.2.2005
gstore: overall server view GSI DVEE Palaver 15.2.2005
server view: gstore design concepts • strict separation of control and data flow • no bottleneck for data • scalable in capacity (tape and disk) I/O bandwidth • hardwareindependent (as long as TSM support) • platformindependent • uniquename space GSI DVEE Palaver 15.2.2005
server view: cache administration • multithreaded servers for read and write cache • each with own metadata DB • main tasks: - lock/unlock files - select data movers and file systems - collect actual infos on disk space soon: data mover and disk load -> load balancing - trigger asynchronous archiving - disk cleaning • several disk pools with different attributes: StagePool, RetrievePool, ArchivePool, DAQPool, ... GSI DVEE Palaver 15.2.2005
usage profile: batch farm batch farm: ~120 double processor nodes => highly parallel mass storage access (read and write) • read requests: 'good' user: stage all files before use wildcard chars 'bad' user: read lots of single files from tape 'bad' system: stage disk/DM crashes during analysis • write requests: via write cache distribute as uniformly as possible GSI DVEE Palaver 15.2.2005
usage profile: experiment DAQ • several continous data streams from DAQ • keep same DM during life time of data stream • only via RFIO • GSI extensions necessary: rfio_[f]endfile, rfio_[f]newfile • disks faster emptied than filled: network -> disk: ~10 MByte/s disk -> tape: ~30 MByte/s => time to stage for on-line analysis • enough disk buffer necessary for case of problems (robot, TSM, ...) GSI DVEE Palaver 15.2.2005
current plans: new hardware more and safer disks: • write cache: all RAID 4 TByte (ArchivePool, DAQPool) • read cache: +7.5 TByte new RAID StagePool, RetrievePool, new pools, e.g. with longer file life time • 5 new data movers: new fail-safe entry server • hosts query server, cache administration servers -> query performance! • take-over in case of host failure • metadata DBs mirrored on 2nd host GSI DVEE Palaver 15.2.2005
current plans: merge tsmcli /adsmcli new command gstore: • replaces tsmcli and adsmcli • unique name space (already available) • users need not care in which robot data reside • new archive: policy computing center GSI DVEE Palaver 15.2.2005
brief excursion: future of IBM 3494? • still heavily used • rather full • hardware highly reliable • should be decided this year! GSI DVEE Palaver 15.2.2005
usage IBM 3494 (AIX) GSI DVEE Palaver 15.2.2005
brief excursion: future of IBM 3494? 2 extreme options (and more in between): • no more money investment use as long as possible in a few years: move data to other robot • upgrade tape drives and connect to SAN 3590 (~30 GB, 14 MB/s) -> 3592 (300 GB, 40 MB/s) new media: => 700 TByte capacity access with available data movers via SAN new fail-safe TSM server(Linux?) GSI DVEE Palaver 15.2.2005
current plans: load balancing • acquire actual infoon no. of read/writeprocesses for each disk, data mover, pool • new write request: select resource with lowest load • new read request: avoid 'hot spots' -> create additional instances of stage file • new option '-randomize' for stage/retrieve distribute equally to different data movers / disks split into n (parallel) jobs GSI DVEE Palaver 15.2.2005
current plans: new org. of DMs • Linux platform more familar environment (shell scripts, Unix commands, ...) case sensitive file names current mainstream OS for experiment DV • '2nd level' data movers no SANconnection disks filled via ('1st level') DMs with SAN connection for stage pools with guaranteed life time of files GSI DVEE Palaver 15.2.2005
current plans: new org. of DMs • integration of selected group file servers as '2nd level' data movers disk space (logically) reserved for owners pool policy according to owners many advantages: no NFS => much faster I/O files physically distributed over several servers load balancing of gstore disk cleaning disadvantages: only for exp. data, access via gstore interface GSI DVEE Palaver 15.2.2005
current plans: user interface • a large number of user requests: - longer file names - option to rename files - more specific return codes - ... • program code consolidation • improved error recovery after HW failures • support for successor of alien • GRID support - gstore as Storage Element (SE) - Storage Resource Manager (SRM) -> new functionalities, e.g. reserve resources GSI DVEE Palaver 15.2.2005
Conclusions • GSI concept for mass storage successfully verified • hardware and platform independent • scalable in capacity and bandwidth to keep up with - requirements of future batch farm(s) - data rates of future experiments • gstore able to manage very different usage profiles • but still a lot of work ... to fully reach all discussed plans GSI DVEE Palaver 15.2.2005