150 likes | 232 Views
Building Capability for Facilities Supported Structural Science. Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk. Facilities Support. ISI S. CLF. DLS. Big Facilities for Small Science.
E N D
BuildingCapability for Facilities Supported StructuralScience Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk
Facilities Support ISIS CLF DLS Big Facilities for Small Science
The Science we do - Structure of materials • ~30,000 user visitors each year in Europe: • physics, chemistry, biology, medicine, • energy, environmental, materials, culture • pharmaceuticals, petrochemicals, microelectronics Visit facility on research campus Place sample in beam Diffraction pattern from sample Fitting experimental data to model • Billions of € of investment • c. £400M for DLS • + running costs • Over 5.000 high impact publications per year in Europe • But so far no integrated data repositories • Lacking sustainability & traceability Magnetic moments in electronic storage Hydrogen storage for zero emission vehicles Bioactive glass for bone growth Longitudinal strain in aircraft wing Structure of cholesterol in crude oil
I2S2 - Infrastructure for Integration in Structural Sciences Bridging the gap between raw and derived data • EPSRC National Crystallography Service • service provision function • operates across institutions • moderate infrastructure • Diamond & ISIS • operates on behalf of multiple institutions • processes for experiments • large infrastructure engineered to manage raw data • derived data taken off site on laptops / removable drives • “Lone” researcher scenario • data sharing with colleagues via email • Little or no infrastructure • Little management of raw or derived data
Facilities Lifecycle Record Publication Proposal Metadata Repository Approval Scheduling Data cleansing Subsequent publication registered with facility Experiment Data analysis Scientist submits application for beamtime Tools for processing made available Facility committee approves application Raw data filtered and cleansed Scientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visit
Core Scientific Metadata Model (CSMD) • A common general format for Scientific Studies and data holdings metadata • Cataloguing data holdings • Related to the experiment • Provide access for the Data Owner • Ease citation, sharing collaboration, and integration • Allow easy Federation of distributed metadata into a homogeneous Platform • The Core Metadata model forms the information model for ICAT. Topic Publication Keyword Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter Related Datafile Datafile Parameter
Interactions between research process Proposal Record Publication Approval Scheduling Analysis Tools Facilities Experiment Facilities Experiment Data storage Data cleansing Sample Preparation Data analysis Local experiments Publication Simulation Facilities Proposal Record Publication Literature Review Grant Proposal
ORE-CHEM • An abstract model for planning and enacting chemistry experiments
Earth Sciences: typical workflow • Processing dependent on specialised software • Sustainability issues • Context not routinely captured • Main analysis is reliant on scientist’s knowledge and experience • selecting parameters and interpreting data • recorded in a lab note book • Actual workflow not recorded • Distributed Data - Little shared infrastructure • Raw and reduced data stored at ISIS • Other data on his/her laptop or WebDAV Martin Dove & Erica Yang
Interoperability with Publishers • IUCr journal policy - “data” either • must be supplied in CIF format as an integral part of article submission and are freely available for download or • must be deposited with the Protein Data Bank before or in concert with article publication; the article will link to the PDB deposition using the PDB reference code • Thanks to Brian MacMahon, IUCr IUCr journals Bibliographic databases Structure solution Experiment Publication flow in IUCr journals Chemistry databases Peer review editing Validation Publish Data reduction RAW
Research Activity Model A notion of a research activity – a step in the lifecycle model - Can define different types of activity.
Capabilities • Good established formats • Raw data – e.g. NeXus • Analysed data – e.g. CIF • Well supported processes for data collection especially at facilities • ICAT and similar tools as unifying medium • Simple metadata models for experiments • Areas needing work • Upstream planning and synthesis • Downstream analysed data • Sharing and integration • Drivers • Drive from Facilities (large and small) • Drive from Publishers
Thank YouQuestions?brian.matthews@stfc.ac.ukwww.e-science.stfc.ac.uk/