1 / 14

MaRDI-Gross research data guidance for big science

MaRDI-Gross research data guidance for big science. Juan Bicarregui, Norman Gray , Roger Jones, Simon Lambert and Brian Matthews (STFC e-science, Glasgow, & Lancaster) JISC MRD wrap-up, London, 2012 March 23.

paytah
Download Presentation

MaRDI-Gross research data guidance for big science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MaRDI-Grossresearch data guidance for big science • Juan Bicarregui, Norman Gray, Roger Jones, • Simon Lambert and Brian Matthews • (STFC e-science, Glasgow, & Lancaster) • JISC MRD wrap-up, London, 2012 March 23

  2. The goal: to provide high-level guidance for the strategic and engineering development of Data Management and Preservation plans for ‘Big Science’ data. Following: http://purl.org/nxg/projects/mrd-gw

  3. context

  4. big science • big money: ~20 year history, and millions of $/€/£ (LHC budget is €3bn + detectors, hardware and people) • big author lists: collaborations of 100s of people (LIGO is 800 authors, ATLAS 3000) • big data: petabytes per year (1LHC=10PB/yr) • big admin: MOUs, councils, workshop series • big careers: PhD to tenure on a single project

  5. lots of data • ATLAS/CMS at LHC: 10 PB/yr • LIGO: ~1PB/yr • SKA (by 2020): 1 TB/min or 0.5 EB/yr intercontinentally (this is 0.05% of 1 ZB/yr total worldwide 2015 IP traffic) • Not a problem kilo ➛ mega ➛ giga ➛ tera ➛ peta ➛ exa ➛ zetta ➛ yotta

  6. software • Very large custom data-analysis software suites • ...which are hard to use • ...and require lots of tacit knowledge (ie gained from officemates, and maybe written into wikis) • A major software preservation challenge

  7. data longevity Astronomy data lasts for 1000 years Particle physics data becomes unintelligible about 30 times faster than astronomy data

  8. things that make it easy • Big science projects are often well-resourced, with IT experience, engineering management and clear collaboration infrastructure • Historical experience of ‘large’ data volumes mean everyone knows ad hoc doesn’t work • Always shared facilities, so documented interfaces and SLAs are natural • Confidentiality concerns are well understood (professional priority rather than family secrets)

  9. MaRDI-Gross

  10. target reader • Those (senior and/or über-techie) with the responsibility (voluntary or not) for developing a DMP plan for a large collaboration • ...or other many-person, multi-institutional or multi-national project • ...or funders evaluating such plans

  11. the advice “Here is a copy of CCSDS 650; be creative” but Do The Right Thing

  12. backing that up... • OAIS rationale (what and why) • Policy background: RCUK and STFC data policies; why share data?; issues about openness • Technical background: OAIS as terminology, the DCC model, CASPAR • Planning: turning OAIS into practice; release planning; validation and assessment; modelling costs • Case studies

  13. put another way • A framework for approaching the problem exists, in OAIS • ...which is not just waffle • ...so read X, Y and Z to become the local expert • ...so X’, Y’ and Z’ are the questions to ask, or critical approaches to take, if you’re a funder

  14. the document http://purl.org/nxg/projects/mardi-gross/report Comments on v0.1 by 13 April would me most appreciated v0.2 and possibly a v0.3 in the spring, then a final version after collaboration meetings over the summer

More Related