310 likes | 465 Views
Towards an information model for I2S2. Brian Matthews , Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk. Facilities Process. Record Publication. Proposal. Approval . Scheduling. Data storage.
E N D
Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk
Facilities Process Record Publication Proposal Approval Scheduling Data storage Subsequent publication registered with facility Experiment Data analysis • Characteristics : • - formal application • - set processes • - central infrastructure • - standard tools • - hierarchical control • - dedicated staff • user office • instrument scientists • Library and IT support Scientist submits application for beamtime Tools for processing made available Facility committee approves application Raw data filtered, cleansed and stored Scientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visit
Requirements • Secure access to user’s data • Flexible data searching • Scalable architecture • Extensible architecture • Integration with analysis tools • Access to high-performance resources • Linking to other scientific outputs • Data policy aware
User Office System: User Database Scheduling Health and Safety Proposal Management Data Acquisition System DataAccess Portal Storage Management System Principles • The ICAT software suite • Catalogues all experiment related information • Metadata gathered via integration with existing IT systems • proposal systems • data acquisition • Provides a well defined API for easy embedding into any applications. • Access data anywhere via the web • Annotate and Search for data • Share data with colleagues • Access data via user’s own programs • Utilise integrated e-Science resources • Link to data from your publications Online Proposal System Single Sign On Account Creation and Management Metadata Catalogue ICAT Software Suite, providing the crucial integration of key functions.
ICAT Deployment User Database System Single Sign On Data Storage/ Delivery System Proposal System Publication System ICAT API e-Science Services RDBMS Software Repository Web Services API Command Line Tools Fortran C++ Java Glassfish / JBOSS
Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston http://dublincore.org/documents/singapore-framework/
A Metadata Model for Facilities Science • A common general format/standard for Scientific Studies and data holdings metadata did not exist • By proposing a Model • A specification for the types of metadata to capture Scientific Studies • Cataloguing data holdings: provide access for the Data Owner • Ease citation, sharing collaboration, and integration • Allow easy Federation of distributed heterogeneous metadata systems into a homogeneous (virtual) Platform • Therefore – The Common Scientific Metadata Model (CSMD) developed.
Core Scientific Metadata Model Damian Flannery Name Units String Value Numeric Value Range Top Range Bottom Error Name Units String Value Numeric Value Range Top Range Bottom Error Name/Units/Value etcSearchableIs Sample ParameterIs Dataset ParameterIs Datafile ParameterVerified Reference / Proposal Id Previous ReferenceFacilityInstrument Title Abstract Etc. Name Units String Value Numeric Value Range Top Range Bottom Error Name Chemical Formula Safety Information User Id Role Name Topic Publication Keyword Full Reference URLRepository Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter Related Datafile Datafile Parameter Name Parent IdTopic Level Name Sample IdDescription Name Description Version LocationFormatFormat VersionCreate TimeModify Time SizeChecksum Source Datafile Id Destination Datafile Id RelationS/W ApllicationS/W Version User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc.Element TypeElement Id
Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational aid to where the data on the study can be found. References into the literature and community providing context about the study. • Copyright, patents and conditions of use etc relating to the study and the data in the study • . Metadata granule Metadata Granule Topic Study Description Access Conditions Data Description Data Location Related Material Legal Note
CSMD History • Model first pilot developed in 2001! • Now in ICAT 3.3 • Serving data from STFC Facilities (ISIS, DLS) • Model proven robust – simple yet expressive • http://code.google.com/p/icatproject/
I2S2 - Infrastructure for Integration in Structural Sciences Bridging the gap between raw and derived data • EPSRC National Crystallography Service • service provision function • operates across institutions • moderate infrastructure • Diamond & ISIS • operates on behalf of multiple institutions • processes for experiments • large infrastructure engineered to manage raw data • derived data taken off site on laptops / removable drives • “Lone” researcher scenario • data sharing with colleagues via email • Little or no infrastructure • Little management of raw or derived data
Interactions between research process Proposal Extend to To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies By being: - standardised - modular - extensible Record Publication Approval CSMD Scheduling Analysis Tools Facilities Experiment Facilities Experiment Data storage Data cleansing Sample Preparation Data analysis Local experiments Publication Simulation Facilities Proposal Cover the scientist’s research lifecycle as well as the facilities. Record Publication Literature Review Grant Proposal
Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston http://dublincore.org/documents/singapore-framework/
Issues • Metadata model • Framework for developing metadata model • Modularisation mechanisms and extensions • Formats • Model supporting laboratory tools • How does the model fit ? • Flexibility to handle local processes • Adhoc, partial, un-ordered • What needs changing in the model? • What needs changing in tools? • Data input and maintenance??? • Simple ways of inputting the data • Lab books?
Extension areas: • Secondary analysis data • Preservation data • Publication data • Topic data • chemistry • Controlled lists (ontologies) for • Instruments • Facilities, • Methods • Access control • Safety data • Blogs and notebooks
Part of ISIS study ISIS - ICAT Correction data Sample data Calibration data User inputs Control file Gudrun Scattering function data
Derived Data Generalised model Managing the links between data Inputs of data sets Associated with a software item with a set of parameters Managing this? - lab-books ? - simple tools? - VRE ?