1 / 20

iRODS usage at CC-IN2P3

iRODS usage at CC-IN2P3. Jean-Yves Nief. Talk overview. What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: Hardware setup. iRODS interaction with other services: Mass Storage System, backup system, Fedora Commons etc... iRODS clients usage.

satin
Download Presentation

iRODS usage at CC-IN2P3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iRODS usage at CC-IN2P3 Jean-Yves Nief

  2. Talk overview • Whatis CC-IN2P3 ? • WhoisusingiRODS ? • iRODS administration: • Hardware setup. • iRODS interaction withother services: • Mass Storage System, backup system, Fedora Commons etc... • iRODS clients usage. • Architecture exampleswithcollaborating sites. • Rulesexamples. • SRB to iRODS migration. • To-do list and prospects. iRODS at CC-IN2P3

  3. IRFU CC-IN2P3 activities • Federatecomputingneeds of the french scientificcommunity in: • Nuclear and particlephysics. • Astrophysics and astroparticles. • Computing services to international collaborations: • - CERN (LHC), Fermilab, SLAC, …. • Openednow to biology, Arts & Humanities. iRODS at CC-IN2P3

  4. iRODS setup @ CC-IN2P3 • In production sinceearly 2008. • 14 servers: • 2 iCAT servers (metacatalog): Linux SL4, Linux SL5 • 12 data servers (520 TB): Sun Thor x454 with Solaris 10, DELL v510 with Linux SL5. • Metacatalog on a dedicated Oracle 11g cluster. • Monitoring and restart of the services fullyautomated (crontab + Nagios). • Automaticweeklyreindexing of the iCATdatabases. • Accounting: daily report on our web site. iRODS at CC-IN2P3

  5. iRODS setup @ CC-IN2P3 Data server Data server Data server Data server … • DNS alias: • loadbalanced. • redundancyimproved. • avoid single point of failure. • scalabilityimproved. iCAT server iCAT server … DNS alias: ccirods iRODS at CC-IN2P3

  6. iRODS monitoring: Nagios iRODS at CC-IN2P3

  7. iRODS interaction withother services • Mass storage system: HPSS. • Using compound resources. • Interfacedusing the universal MSS driver (RFIO protocolused). • Stagingrequestsordered by tapes usingTreqs. • Backup system: TSM. • Used for projectswho do not have the possibility to replicateprecious data on other sites. • Fedora Commons: • Storage backendbased on iRODSusing FUSE. • Rules to registeriRODS files intoFedora. • Externaldatabases: • RulesusingRDA. iRODS at CC-IN2P3

  8. iRODS clients • Clients: fromlaptop to batch farms. • Authentication: password or X509 certificates. • iCommands: mostpopular. • Fromanyplatform: Windows, Mac OSX, Linux (RH, CentOS, Debian…), Solaris 10. • Java APIs: interaction withiRODSwithinworkflows. • C APIs: direct access to files (open, read, write) to do « randomaccess ». Drivers for someviewersuch as OsiriX (biomedicalapps). • FUSE for legacy web sites and Fedora Commons. • Windows explorer and iDrop. iRODS at CC-IN2P3

  9. WhoisusingiRODS ? • High energy and nuclearphysics: • BaBar: data management of the entire data set between SLAC and CC-IN2P3: total foreseen 2PBs. • dChooz: neutrino experiment (France, USA, Japanetc…): 500 TBs. • Astroparticle and astrophysics: • AMS: cosmic ray experiment on the International Space Station (500 TBs). • TREND, BAOradio: radioastronomy (170 TBs). • Biology and biomedical applications: phylogenetics, neuroscience, cardiology (50 TBs). • Arts and Humanities: Adonis (62TBs). iRODS at CC-IN2P3

  10. iRODS use cases • Data sharing and transfers for widespreadcommunities. • Online data accessfromanykind of front-end app (web, home grown clients, batch farm…) allowingdata policies to berun on the data underneath. • Data archive. Not intended for massive I/O opsfrom a batch farm! ( not a parallel file system) iRODS at CC-IN2P3

  11. WhoisusingiRODS ? iRODS at CC-IN2P3

  12. Architecture example: BaBar • archival in Lyon of the entireBaBar data set (total of 2 PBs). • automatictransferfrom tape to tape: 3 TBs/day (no limitation). • automaticrecovery of faultytransfers. • ability for a SLAC admin to recover files directlyfrom the CC-IN2P3 zone if data lostat SLAC. iRODS at CC-IN2P3

  13. Architecture example: dChooz iRODS at CC-IN2P3

  14. Architecture example: embryogenesis and neuroscience iRODS at CC-IN2P3

  15. Rulesexamples (I) • Delayedreplication to the MSS: • Data on disk cache replicationinto MSS asynchronously (1h later) using a delayExecrule. • Recoverymechanism: retries untilsuccess, delaybetweeneach retries isdoubledateach round. • ACL management: • Rulesneeded for fine granularityaccessrights management. • Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all otherssubcollections => admins + experts : r/w, users : r iRODS at CC-IN2P3

  16. Rulesexamples (II) • Fedora Commons: • Tar balls content stored in iRODS are automaticallyregisteredintoFedora Commons. • Automaticuntar of the files + checksum on the iRODSside: msiTarFileExtract. • Automatic registration in Fedora-commons (delayedrule): msiExecCmdof a java application. • Automaticmetadata extraction from DICOM files (neuroscience…): • A givenpredefinedlist of metadataisextractedfrom the files using DCMTK (thanks to Yonny), then user metadata are created for each file. iRODS at CC-IN2P3

  17. SRB to iRODS migration • SRB stillused: 1.7 PBsstillthere. • Migration to iRODSalready made for BioEmergence (embryogenesis) in 2010: • Data workflowwasusing Jargon: transparent. • Migration fromScommands to icommandswasneeded. • 2 hours of downtime to complete the migration (scripts wereneeded). • Needs to migrate all the otherprojects by the end of 2012, beginning of 2013: • SRB isdeeplyembedded in data management workflows and projectscan’t live without SRB.  Main issue: migration shouldbe as « transparent » as possible in order to keep up with the data activity. iRODS at CC-IN2P3

  18. To-do list • Complete SRB to iRODS migration. • Connection control: • Connections can come fromanywhereespecially batch farms on the data grid. • Servers canbeoverwhelmed (network, diskactivity for hundreds of connection in //). • Causes clients to exit with an error not good. • Improved version of CCMS (connection control) isneeded. • Conversion to rules of the scripts used to manage cache space on compound resources. • Dealingwithfilenamewithaccentuatedcharacters for iCommands on Windows. • Provide a light weighttransfertool for every single users (ship files between CC-IN2P3 to distant site). • Centralized administration through a GUI (15 instances of iRODS running so far). iRODS at CC-IN2P3

  19. Prospects • 4 PBs in iRODS as of Sep 2012 (shouldbe5 PBsat the end of thisyear). • Future projects: • Biomedicalfield: research in cardiology, MS (anonymization) with data from > 10 hospitals. • Privatecompanies (data encryptionneeded ?). • Astrophysics. • Grid: iRODSofficiallypromoted by the French NGI. iRODS at CC-IN2P3

  20. Acknowledgement • Thanks to: • Pascal Calvat. • Yonny Cardenas. • Rachid Lemrani. • Thomas Kachelhoffer. • Pierre-Yves Jallud. iRODS at CC-IN2P3

More Related