1 / 24

LTSM for FAIR Phase 0 Data Acquisition: Storage System and Experiences from HADES Beamtime

This presentation discusses the LTSM software for data storage in the FAIR Phase 0 data acquisition, focusing on the HADES experiment. It covers LTSM capabilities, integration with DABC and MBS DAQ, and experiences from the HADES beamtime. The talk also explores the future outlook for LTSM with the FSD server.

feltonh
Download Presentation

LTSM for FAIR Phase 0 Data Acquisition: Storage System and Experiences from HADES Beamtime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jörn Adamczewski-Musch, GSI / EEL, HADESThomas Stibor, GSI / HPC Mass storage interface LTSM for FAIR Phase 0 data acquisitionCHEP 2019, Adelaide

  2. Overview • Motivation: DAQ storageusecases • LTSM software • LTSM with DABC and MBS DAQ • HADES beamtime 2019 experiences • Outlook: LTSM with FSD server J. Adamczewski-Musch,CHEP 2019

  3. DAQ storage use cases storagesystem DAQ Event- builders DETECTORS J. Adamczewski-Musch,CHEP 2019

  4. DAQ storage use cases (1) 1.) Long termarchive tape storagesystem DAQ Event- builders DETECTORS J. Adamczewski-Musch,CHEP 2019

  5. DAQ storage use cases (2) 1.) Long termarchive tape storagesystem DAQ Event- builders DETECTORS synchronized!? /filespace/ HPC cluster adjustmentsduring beam time (alignment, trigger, thresholds, ...) 2.) „near online“ analysis J. Adamczewski-Musch,CHEP 2019

  6. DAQ storage with LTSM tape LTSM socket IBM SAN TSM server DAQ Event- builders HSM copytool DETECTORS synchronized! LTSM socket ltsmc /lustre anynode HPC cluster /data.local J. Adamczewski-Musch,CHEP 2019

  7. The LTSM software(Lightweight Tivoli Storage Management) Development byThomas Stibor, GSI-HPC (https:/github.com/tstibor/ltsm) • Based on IBM TSM clientapifor x86_64 Linux • Write fromclienttotapearchive via TCP/IP sockets (C-API) • Transparent synchronizationbetweentapeandlustrefilesystem: Lustre HSM (HierarchicalStorage Management) copytool • archivefilepathis same aslustrefilepath(e.g. /lustre/hades/mar19/*.hld) • filecanberetrievedfromtape on firstuserreadaccesstolustre • filecanbeautomaticallyarchivedtotapefromlustre • Command linetoolltsmc • query, retrieve, archive, deleteoperationsfromany TSM client • forinteractiveconsoleorscriptingusecase J. Adamczewski-Musch,CHEP 2019

  8. LTSM with FAIR-0 DAQ software • Data Acquisition Backbone Core (http://dabc.gsi.de) • DAQforHADES experiment(http://hades.gsi.de) • New: LTSM plug-inwith POSIX-like file API for all dataformats • Multi Branch System (http://daq.gsi.de) • DAQ forNUSTARandAPPAresearchpillarsof FAIR • New: Interface serverbetweenoldgstoreprotocoland LTSM J. Adamczewski-Musch,CHEP 2019

  9. LTSM for HADES with DABC plug-in TSM server Event builders (2.) lxltsm01 lxhadebXY 10 GbE TCP/IP 10 GbEIP /localdata/ HADES TSMnodehades file i/o DABC (1.) tape-robot SAN TCP/IP (4.) (3.) HSM copytoolnode lxcopytool01 hsm/copytool /lustre/.../raw/.../mar19/*.hld ltsmsync.sh file i/o J. Adamczewski-Musch,CHEP 2019

  10. LTSM for MBS DAQ with RFIO gateway rawDispLTSM64 10 GbE TCP/IP rawServLTSM64 10 GbE TCP/IP gatewayserver TSM server x86L-96 RFIOserver RFIOdispach (2.) LTSMAPI Event builder lxltsm01 l /localdata/ TSMnodetasca file i/o RFIO client VMEbus / PCIe DAQ tape-robot SAN fork TCP/IP request (4.) HSM copytoolnode (1.) lxcopytool01 hsm/copytool /lustre/.../*.lmd ltsmsync.sh file i/o (3.) MBS nodedoesnot need TSM clientlibrary! (<- not availableforLynxOS, PPC VME Linux etc.) uselegacygstore/RFIO protocol J. Adamczewski-Musch,CHEP 2019

  11. HADES (High Acceptance Di-ElectronSpectrometer) experiment at FAIR-0 in March 2019 (Ag-Ag@1.58 AGeV) MDC 1-4 ECAL RPC/TOF ECAL MDC 3-4 RICH MDC 1-2 FWALL START Ag-Target SC-Magnet RICH SC-Magnet RPC/TOF https://www-hades.gsi.de J. Adamczewski-Musch,CHEP 2019

  12. HADES DAQ March 2019 with LTSM Event buildingnetwork (BNET) Sub Event Full Event LTSM 10 GbE .hld 10 GbE TCP/IP TRB 10 GbE server 1 DABC .hld /lustre TRB DABC localfilei/o TRB HADES detectors server 2 ltsmc storageinterface DABC .hld TRB /data01 DABC UDP TCP local RAID6 TRB 1...16 nodeswriting parallel files control 4..5 nodes TRB DABC DABC DABC MASTER 35trbnethubs TRB web server server 3 J. Adamczewski-Musch,CHEP 2019 TRB DABC TRB TRB

  13. HADES beamtime Ag-Ag@1.58 AGeV: data taking March 2019 total 360 TB beam data (380 TB with cal, cosmics etc.) @spill maximum: 390 Mbyte/s (16kEvents/s) (6.2 kEvent/s) Event rate (Hz) run netto rates: 208.6 Mbyte/s (8.7. kEvents/s) (=> 72% beam time usage) 18 s time J. Adamczewski-Musch,CHEP 2019

  14. Experiences from MAR19 beamtime (1) Event builder LTSM socket to TSM server: mostly OK! But: • TSM server timeout after 1 minute without writing (beam pause during open file) • Event builder process terminates intentionally, operator restart required! • started files were lost for archive at first! – about 50 files/day • files could be archived later with ltsmc script from local eventbuilderdisk duplicates • reason: wrong TSM server configuration! was easily fixed. J. Adamczewski-Musch,CHEP 2019

  15. Experiences from MAR19 beamtime (2) Raw data transfer to /lustre “on time” (within 1 hour ?): • provided LTSM methods: HSM copytool+ requesting script ltsmsync.sh • haveprovedtoworkforsmallerexperiments like TASCA with MBS • not fast enough for HADES! • conflict on TSM server with concurrent tape migration jobs! • HADES workaround: rsynccronjobfor files from eventbuilder server disks • OK, but sometimes did affect eventbuilder processes (cpuio wait!) -> lost events! => Further LTSM developmentsrequired! J. Adamczewski-Musch,CHEP 2019

  16. Under development: LTSM File System Demon (FSD) TSM server 2 TSM server 1 autoreplication lxltsm01 lxltsm02 TCP/IP TSMnodehades TSMnodehades Event builders (4.) SAN TCP/IP (5.) tape-robot 1 lxhadebXY SAN 10 GbEIP tape-robot 2 HSM/FSD node lxltsmFSD 10 GbE TCP/IP HADES DABC HSM copytool /lustre/.../raw/.../*.hld FSD server (1.) file i/o (3.) /localdata/ LTSM/FSD socket API bufferforpossible lustrelatencies! RAID6file i/o (2.) • filesare on lustrefirstandarearchivedwithcopytoolsubsequently J. Adamczewski-Musch,CHEP 2019

  17. Summary andoutlook • LTSM providesinterfacesandtoolstoaccess IBM TSM storage in connectionwiththelustrefilesystem • LTSM hasbeenintegratedto GSI DAQ frameworks DABC and MBS • Experiments HADES, TASCA, R3B,.. havesuccessfullyused LTSM forrawdatastorageduring FAIR-0 beam times in 2019 • The LTSM/FSD serverisunderdevelopmentfor 2020 toprovidefasteravailabilityof DAQ files on lustre Thankyou! J. Adamczewski-Musch,CHEP 2019

  18. Bonus slides J. Adamczewski-Musch,CHEP 2019

  19. LTSM C - API functions • inttsm_fconnect(structlogin_t *login, structsession_t *session); • void tsm_fdisconnect(structsession_t *session); • inttsm_fopen(const char *fs, const char *fpath, const char *desc, structsession_t *session); • ssize_ttsm_fwrite(const void *ptr, size_t size, size_tnmemb, structsession_t *session); • inttsm_fclose(structsession_t *session); connect/disconnect TSM server open file writebuffer tofile closefile J. Adamczewski-Musch,CHEP 2019

  20. LTSM lustre HSM copytool lustreusernode (4.) (2.) MetaData Service (1.) (3.) tape Object Storage Service IBM TSM Server Copytool Server J. Adamczewski-Musch,CHEP 2019

  21. Command linetoolltsmc • querydatabyltsmc –query • retrievefilesbyltsmc –retrieve • manuallyarchivelocalfilesbyltsmc –archive • allowshigherlevelscripts (examplesof HADES): • archivelocalfileswith different path on lustre (archive_ltsm_jul18.sh) • comparelocalandarchivedfiles (archived_data_ltsm.pl) • archive a setofmissingfiles (data2tape_ltsm.pl) J. Adamczewski-Musch,CHEP 2019

  22. HADES raw data to lustreMAR19: rsyncworkaround lxltsm01 lxhadebXY 10 GbE TCP/IP 10 GbEIP /localdata/ HADES TSMnodehades file i/o DABC tape-robot RAID6file i/o SAN TCP/IP /data01/data/*.hld lxcopytool01 NFS HSM copytool /lustre/.../raw/.../mar19/*.hld ltsmsync.sh lxhadeb13 10 GbE TCP/IP lxbk199 rsync /lustre/.../raw2/.../mar19/*.hld file i/o J. Adamczewski-Musch,CHEP 2019

  23. FSD statusandtests • API implemented in DABC • 12 x 80 MB/s socket writetest(2 days, randomdata) lxhadeb08,09,10,11 -> lxltsm02 (local RAID) • transfertolustremechanismTO DO • HSM synclustre -> TSM TO DO • setupofproductionserverTO DO • Test of HADES storagechain (EB -> lustre -> TSM) TO DO ExpectedfullimplementationJanuary 2020 J. Adamczewski-Musch,CHEP 2019

  24. LTSM with FSD API test(10.-12.09.19 –> 42h , 150TB) ~ 80 MB/s lxhadeb09 lxhadeb10 lxhadeb11 lxhadeb08 FSD servershutdown J. Adamczewski-Musch,CHEP 2019

More Related