150 likes | 258 Views
Generic policy rules and principles. Jean-Yves Nief. Talk overview. An introduction to CC-IN2P3 activity . iRODS in production: Why are we using it ? Who is using it ? Prospects. iRODS rules policies through examples : Resource Monitoring System. Biomedical applications:
E N D
Genericpolicyrules and principles Jean-Yves Nief
Talk overview • An introduction to CC-IN2P3 activity. • iRODS in production: • Why are weusingit ? • Whoisusingit ? • Prospects. • iRODSrulespoliciesthroughexamples: • Resource Monitoring System. • Biomedical applications: • Human data. • Animal data. • Arts and Humanities. • Otherrules: Mass storage system interface, accessrights. • Pitfalls. • Future usages. Repository workshop - Garching
dapnia CC-IN2P3 activities • Federatecomputingneeds of the french scientificcommunity in: • Nuclear and particlephysics. • Astrophysics and astroparticles. • Computing services to international collaborations: • - CERN (LHC), Fermilab, SLAC, …. • Openednow to biology, Arts & Humanities. Repository workshop - Garching
iRODS @ CC-IN2P3: whyusingit ? • National and international collaborations. • Usersspreadgeographically (Europe, America, Australia…). • Needfor storagevirtualization: • federation of heterogeneousstorage (disks, tapes) and data access system (MSS, databases…). • transparent data access for end users. • middleware working on heterogeneous OS. • commonlogicalnamespace. • virtualorganization (accessrights, groups etc…). • metadatasearch. • Easy interface withanykind of clients applications (APIs, drivers). Repository workshop - Garching
iRODS @ CC-IN2P3: whyusingit ? • SRB beingusedsince 2003: • 3 PBshandled for 10 differentexperiments (HEP, astro, biology). • Decomissionning: end of 2012 ? • Limitation: • no centralized data management (DM). • no enforcement of DM policy. • iRODSrulesbasedpolicy: • adequate solution. • from the user point of view: virtualization of data management policy. Repository workshop - Garching
iRODS @ CC-IN2P3: whoisusingit ? • Arts and Humanities (Adonis): • Long term data preservation. • Web and batch jobs access. • Biology (phylogenetic), fluidmechanics: • grid jobs. • Biomedical applications: • Human and animal imagery. • Biology (phylogenetic), fluidmechanics: • grid jobs. • High Energyphysics: • Neutrino experiment. Repository workshop - Garching
iRODS @ CC-IN2P3: whoisgoing to use it ? • Astrophysicsexperiments: • LSST … • Otherbiomedical, physicsprojects. • iRODSwillbe part of French NGI. • All the SRB instances to bemoved to iRODS. 1 PB shouldbereachedsoon. Repository workshop - Garching
Rulesexamples: Arts and Humanities • Ex: archival and data publication of audio files (CRDO). Data transfer: CRDO CINES (Montpellier). Archivedat CINES. iRODStransfer to CC-IN2P3: iput file.tar Automaticuntarat Lyon + checksum. Automatic registration in Fedora-commons(delayedrule). CRDO CC-IN2P3 Fedora CINES Archive Repository workshop - Garching
Rulesexamples: biomedical data • Human and animal data (fMRI, PET, MEG etc…). • Usually in DICOM format. • Main issue for human data: • Need to beanonymized ! • Need to do metadatasearch on DICOM files. • Rule: • Check for anonymization of the file: send a warning if not true. • Extract a subset of metadata (based on a liststored in iRODS) from DICOM files. • Addthesemetadata as user definedmetadata in iRODS. Repository workshop - Garching
iRODS data server Perf script Perf script Perf script Perf script iRODS data server iRODS data server DB iRODS data server Rulesexamples: resource monitoring system • Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. iRODS iCAT server 3. Results sent back to the iCAT. 4. Store metrics into iCAT. 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi). Repository workshop - Garching
Otherrules • Mass Storage System integration: • Using compound resources: iRODSdisk cache + tapes. • Data on disk cache replicationinto MSS asynchronously (1h later) using a delayExecrule. • Recoverymechanism: retries untilsuccess, delaybetweeneach retries isdoubledateach round. • ACL management: • Rulesneeded for fine granularityaccessrights management. • Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all otherssubcollections => admins + experts : r/w, users : r Repository workshop - Garching
Developpementsneeded • Scripts/binaries: • Metadata extraction from DICOM files. • Registration of files intoFedora-Commons. • … Neededwhateverstorage system beingusedunderneath. • Micro-services: • ACLs, tar/untar of archives file,… • APIs alreadyavailable, did not require a large amount of work (parts of iRODSdistro). • Resource Monitoring System: biggerdeveloppement, includes modification of the iCATschema. • Rules: • Most of them are simple. • Somes requires more work (Adonis project), workflow more complex. Repository workshop - Garching
Pitfalls and bugs • Writingcomplexrules: • Avoidwritingthemdirectlyusing the .irbsyntax. • Becomesdifficult to debugespeciallywithnested actions. • solution: need to use ruleGen to generaterules in a more user friendlymanner. • SomememoryleaksfoundwithirodsReServerwith Oracle as a backend: Fixed in 2.4. • delayExecsyntax bugs: • Fixed in 2.4 and 2.4.1. • Rules in configuration file at the moment: • Must be consistent on all the iRODS servers. Will be in the iCATdatabase in the future. Repository workshop - Garching
Prospects • Rules for database interaction (in progress): • Will beused by DTM (developpedat CC-IN2P3): • DTM managedlist of tasks to beprocessed by a batch cluster. • DTM requires a database to manage the tasks. • Rulelaunched by the client willinteractwith the DTM databasethroughiRODS: • More security: iRODSused as a proxy server (databasebehind a firewall, use iRODSauthentication. • Databaseschema upgrade transparent for the client (no SQL code launched on the client side). • Xmessaging system (part of iRODS): • Allow to exchange messages betweendifferentiRODSprocess or clients. • e.g.: Couldbeused to monitor job status in a distributedcomputing environnement. Repository workshop - Garching
Acknowledgement • Thanks to: • Pascal Calvat. • YonnyCardenas. • Thomas Kachelhoffer. • Pierre-Yves Jallud. iRODS at CC-IN2P3