140 likes | 226 Views
MaRDI-Gross research data guidance for big science. Juan Bicarregui, Norman Gray , Roger Jones, Simon Lambert and Brian Matthews (STFC e-science, Glasgow, & Lancaster) JISC MRD wrap-up, London, 2012 March 23.
E N D
MaRDI-Grossresearch data guidance for big science • Juan Bicarregui, Norman Gray, Roger Jones, • Simon Lambert and Brian Matthews • (STFC e-science, Glasgow, & Lancaster) • JISC MRD wrap-up, London, 2012 March 23
The goal: to provide high-level guidance for the strategic and engineering development of Data Management and Preservation plans for ‘Big Science’ data. Following: http://purl.org/nxg/projects/mrd-gw
big science • big money: ~20 year history, and millions of $/€/£ (LHC budget is €3bn + detectors, hardware and people) • big author lists: collaborations of 100s of people (LIGO is 800 authors, ATLAS 3000) • big data: petabytes per year (1LHC=10PB/yr) • big admin: MOUs, councils, workshop series • big careers: PhD to tenure on a single project
lots of data • ATLAS/CMS at LHC: 10 PB/yr • LIGO: ~1PB/yr • SKA (by 2020): 1 TB/min or 0.5 EB/yr intercontinentally (this is 0.05% of 1 ZB/yr total worldwide 2015 IP traffic) • Not a problem kilo ➛ mega ➛ giga ➛ tera ➛ peta ➛ exa ➛ zetta ➛ yotta
software • Very large custom data-analysis software suites • ...which are hard to use • ...and require lots of tacit knowledge (ie gained from officemates, and maybe written into wikis) • A major software preservation challenge
data longevity Astronomy data lasts for 1000 years Particle physics data becomes unintelligible about 30 times faster than astronomy data
things that make it easy • Big science projects are often well-resourced, with IT experience, engineering management and clear collaboration infrastructure • Historical experience of ‘large’ data volumes mean everyone knows ad hoc doesn’t work • Always shared facilities, so documented interfaces and SLAs are natural • Confidentiality concerns are well understood (professional priority rather than family secrets)
target reader • Those (senior and/or über-techie) with the responsibility (voluntary or not) for developing a DMP plan for a large collaboration • ...or other many-person, multi-institutional or multi-national project • ...or funders evaluating such plans
the advice “Here is a copy of CCSDS 650; be creative” but Do The Right Thing
backing that up... • OAIS rationale (what and why) • Policy background: RCUK and STFC data policies; why share data?; issues about openness • Technical background: OAIS as terminology, the DCC model, CASPAR • Planning: turning OAIS into practice; release planning; validation and assessment; modelling costs • Case studies
put another way • A framework for approaching the problem exists, in OAIS • ...which is not just waffle • ...so read X, Y and Z to become the local expert • ...so X’, Y’ and Z’ are the questions to ask, or critical approaches to take, if you’re a funder
the document http://purl.org/nxg/projects/mardi-gross/report Comments on v0.1 by 13 April would me most appreciated v0.2 and possibly a v0.3 in the spring, then a final version after collaboration meetings over the summer