1 / 14

DPHEP Update Data Sharing – In Time and Space

DPHEP Update Data Sharing – In Time and Space. International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics. Jamie.Shiers@cern.ch WLCG Grid Deployment Board, June 2014. Mid-Term Status Update. DPHEP “2020 vision” & implementation

owen
Download Presentation

DPHEP Update Data Sharing – In Time and Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DPHEP UpdateData Sharing – In Time and Space International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics Jamie.Shiers@cern.ch WLCG Grid Deployment Board, June 2014

  2. Mid-Term Status Update • DPHEP “2020 vision” & implementation • DPHEP Collaboration Agreement • H2020 (and beyond…) • (DPHEP @ CHEP 2015?) • Summary

  3. 2020 Vision for LT DP in HEP • Long-term – e.g. FCC timescales: disruptive change • By 2020, all archived data – e.g. that described in DPHEP Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotatefurther • Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards • DPHEP portal, through which data / tools accessed • Agree with Funding Agencies clear targets & metrics

  4. DPHEP Portal • Since January, a TECH has been working on a prototype implementation of the DPHEP portal –http://dphep-demo.cern.ch • Currently not visible (easily) outside CERN • Based on INVENIO technology, plus information from DPHEP.org website • Clearly a prototype – comments, suggestions needed before going further

  5. Populating the portal • A lot of information that can be used to populate the portal already exists • Experiment websites, CERN Grey Book etc. • An attempt to “load” the basic information – in as common a way as possible – will be made this summer using a CERN CHILD • Information largely static but good to have a common look & feel and agreed content • Support from experiments (and sites) needed

  6. “Information packages” • OAIS defines a number of “information packages” that are used to assess data in the archive and how to use it • A FELL will help define these and help populate – starting January 2015 (TBC) • In any case, FELL can help with other tasks, e.g. preserving analysis code / workflows – priority? • All above part of “best practices, tools & services…” • Coordination with others through joint workshops, RDA IG/WGs (projects?)

  7. Bit Preservation • For LHC data (and other “bits” in the CERN repository) this is under control and funded (till?) • A 2nd media migration is currently underway with an outlook to the next • See German Cancio’sslides at HEPiX for more info • We need to consider 2nd (and 3rd?) copies of the data, plus book-keeping to know now – and in 30 years – where the “right” data is • (Production versus long-term archiving) • Processing / re-processing / re-re-processing…

  8. C-RSG & RRB • Data preservation: distinguish ability to read/analyse old data from requirements for open/public access (both tech + effort) • Former should be included in cost of computing • Latter is additional cost? • How to interpret this? IMHO, this is: • Good news – essential for long-term preservation; • Unsurprising; • How much “Open Access” data to we really need to provide? • We should provide what we can afford – even if this only a few TB (surely affordable) with a rolling time window(?) • Continue dialog with funding agencies (who are in some cases, e.g. EU, US, asking for Open Access)

  9. Collaboration Agreement • After much discussion, a version that is acceptable to up to 9 parties has been made • Now: CERN, DESY, HIP-FI, IHEP, IN2P3, IPNS (KEK), MPP • Later: BNL, CSC-FI, FNAL, INFN, SLAC, STFC • Hope for signatures from all 9 parties this month (3 are required for CA to come into effect) • CERN (Sergio) has signed 10 (+1) times  • Again, valuable for long-term • Collaboration Board formed from partners, who elect chair and project manager (from partners) • 3 year terms • First CB will hopefully be held later this year…

  10. H2020 and beyond • 3 calls of particular relevance to LTDP are open • EINFRA-1 – Managing, Preserving and Computing with Big Research Data • INFRADEV-4 – … services and solutions for clusters of ESFRI and other RIs • EINFRA-9 – e-Infrastructures Virtual Research Environments (VREs, used by VRCs such as WLCG) • These are COMPLEMENTARY calls and COMPLEMENTARY proposals are expected • We should use our long experience in generic vs specific (vsshared) solutions to target these calls • Closing date: September 2 for EINFRA-1 and INFRADEV-4, January 14 for EINFRA-9 • MORE CALLS WILL APPEAR FOR 2016/2017 (2018/2019, …) • AND WE ARE BEING ASKED FOR OUR INPUT TO THESE CALLS NOW

  11. EINFRA-1 sub-topic 7 • Proof of concept and prototypes of data infrastructure-enabling software (e.g. for databases and data mining) for extremely large or highly heterogeneous data sets scaling to zetabytes and trillion of objects. Clean slate approaches to data management targeting 2020+ 'data factory' requirements of research communities and large scale facilities (e.g. ESFRI projects) are encouraged. • A project addressing this topic is being prepared, including “bit preservation”, as well as hooks for “VRE-level” services

  12. Targets & Metrics • Something like the Data Seal of Approval gives us a clear framework for defining what we are doing at the level of the Archive (+ entrance / exit “doors”) • “Self-assessment” proposed at WLCG OB for Tier0 + small number of Tier1s • Typically considered “Entry Level”  NESTOR (DIN)  ISO 16363 • Can form part of DP plan – see also DMPonline • But it is only (E)INFRA level – need also metrics for “VRE”-oriented services: • Open Access data for Educational Outreach; • Analysis Reproducibility; • (Full scientific potential of data)

  13. DPHEP@CHEP 2015 • Only available slot is weekend prior to CHEP, sharing with WLCG, e.g. 11-12 April • IMHO this is not a good option, neither for WLCG, nor for DPHEP • Too short for DPHEP, too constraining for WLCG • It is also close in time – but not in space – to the RDA plenary (presumed end March in SFO)

  14. Summary • Good progress on infrastructure issues required for LTDP • Technical solutions + funding discussions / strategies • Opportunities in H2020 to address INFRA & also the (much) more complex “VRE” issues • Increased clarity on “how”, … Clarification (consensus) still needed on “what”

More Related