210 likes | 387 Views
#DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the Exa -scale. International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics. Jamie.Shiers@cern.ch LHCC Referees Meeting. Overview. Sustainable Strategy Collaboration Agreement
E N D
#DPHEP: Status and OutlookSustainable Strategies for Long-Term DP at the Exa-scale International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics Jamie.Shiers@cern.ch LHCC Referees Meeting
Overview • Sustainable Strategy • Collaboration Agreement • Research Data Alliance • H2020 (NSF?) Prospects
2020 Vision for LT DP in HEP • Long-term – e.g. LC timescales: disruptive change • By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further • Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards • Vision achievable, but we are far from this today
Data Preservation Maturity Model • Scale (complexity) is probably “exponential”
Software Preservation Maturity Model REPRODUCIBLE RESULTS AFTER “PORTING” TO NEW ENVIRONMENT!
Sustainable Strategy • A document on a sustainable strategy for LTDP is available – discussed at DPHEP IB today • This version focuses on CERN (IT) – presented yesterday (attached to agenda: doc, ppt) • Some comments received (DESY, INFN) • DESY comments included in current draft; • INFN: stress need for standards, e.g. for outreach activities based on data from multiple experiments • Intent is to update document to reflect activities of other “Collaboration Members”
ICFA Statement on LTDP • The International Committee for Future Accelerators (ICFA) supports the efforts of the Data Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late-2013. • ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities. • ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies. • 15 March 2013
DPHEP Collaboration Agreement • A draft has been prepared by the CERN legal service, has been sent to ICFA and available to DPHEP since early 2013 • Some comments have been received and integrated • AFAIK CERN, DESY, FNAL and SLAC “ready” to sign • Target: prior to CHEP 2013 (RDA-2 might be better!) • Next steps: get legal services in touch with each other and complete process • CERN & DESY: defining activities as part of Collaboration
RDA Preservation WG • The RDA – strongly supported by EU, NSF, AU – seen as an element of implementing HLEG 2030 vision • A WG on DP was approved in May • Chair: David Giaretta (APA, SCIDIP-ES, author of “Advanced DP”, ex-DCC, ex-STFC) • Co-chair: JDS • The intent is to show progress by each RDA plenary (March, September) and co-ordinate international activities, identify candidate services for standardization, lobby for funding…
Component Breakdown • Can break this down into three distinct areas • (OAIS reference model is somewhat more complex: this is a zeroth iteration) • “Archive issues” • Digital Libraries & “Adding Value” to data • “Knowledge retention” – the Crux of the Matter
Archive Issues • We (HEP) has significant experience of 100PB+ distributed data stores • Plan is to coordinate long-term “bit preservation” issues via HEPiX • And with other disciplines e.g. via IEEE MSST • Sustainable models for long-term multi-disciplinary data archives still to be solved • H2020 funding targetted for this
Digital Libraries • Significant investment in this space, including multiple EU (and other) funded projects • No reason to believe that the issues will not be solved, nor that funding models will not exist, e.g. adapted from “traditional” libraries • Related topics: “linked data”, “adding value to data” – again with projects / communities • Should work closely with these projects / communities, not start new initiatives
Where to Invest– Summary Tools and Services, e.g. Invenio:could be solved. (2-3 years?) Archival Storage Functionality:should be solved. (i.e. “now”) Support to the Experiments for DPHEP Levels 3-4: must be solved – but how?
Who Can Help? • Mobilize resources through existing structures: • Research Data Alliance: • Funding / strong interest from EU, US, AU, others • Part of roadmap to “Riding the Wave” 2030 Vision • STFC and DCC personnel strongly involved in setup • WLCG: • Efforts on “software re-design” for new architectures • Experiment efforts on Software Validation (to be coordinated via DPHEP), building on DESY & others • DPHEP: • Coordination within HEP and with other projects / disciplines • National & International Projects • H2020 / NSF funding lines • National projects also play an important role
DataGenerators User functionalities, datacapture & transfer, virtualresearch environments Users Data Curation Data discovery & navigationworkflow generation,annotation, interoperability Trust Community Support Services Trust Persistent storage,identification, authenticity, workflow execution, mining Common Data Services Collaborative Data Infrastructure – Riding The Wave HLEG Report
H2020 Prospects • According to Kostas Glinos (e-IRG meeting, Dublin) first calls: December 11 2013 • “Framework for action” (part of open consultation) has a “fiche” targetting DP • DPHEP ICFA report (2020 vision) sent to Carlos MP • “References to RDA are appreciated and I really hope that you take a leading role in bringing people and key players together around a global initiative to tackle the issue of “highly reliable and highly trusted infrastructures for research data preservation”. • IMHO: need to prepare now (collaboration, WP, tasks) – likely discuss this at RDA Plenary, CHEP 2013, PV …
A Strategy for H2020? • Front-end: collaborate with on-going efforts in Digital Libraries, Linked Data, PV etc. • Significant effort (also HEP expertise): very high probability of further funding in H2020 (+RDA) • DP(HEP) is already part of these projects: feed in requirements & collaborate (PRELIDA WS??) • Back-end: collaborate through HEPiX & IEEE MSST • Seek specific H2020 funding for CDIs, including TCO, long-term, sustainable inter-disciplinary archives • Middle: • Collaborative effort on Validation Frameworks, Virtualization, Training, Outreach etc. • Includes institute / national funding • Work for “Concurrency Framework” and other efforts so that future migrations less painful; more repeatable • [ CERNLIB consortium ] • Seek further funds (H2020, RDA) to further develop and generalize • Several (all?) relevant “fiches” in “Call for Action” document • fiche 01: community support data services • fiche 02: infrastructure for Open Access • fiche 03: storing, managing and preserving research data • fiche 04: discovery and provenance of research data • fiche 05: towards global data e-infrastructures • fiche 06: global A&A e-infrastructures • fiche 07: skills and new professions for research data
Other Activities • Various project proposals in preparation / review • On-going activities in the experiments: “DPHEP classic” as well as LHC • Discussions with CMS on validation system – other LHC experiments expected to join • DPHEP session at CHEP 2013 – outlook for CHEP 2015? (tighter integration into programme) • Presentations accepted at numerous conferences / workshops – building more links with other disciplines • DPHEP IB (modeled on WLCG) monthly call
Summary • Making good progress on multiple fronts • “Sustainable strategy” being discussed (and then put in place) • Good inter-disciplinary collaboration • Optimistic regarding H2020 and also NSF(+) – but needs work! • #DPHEP for news!