360 likes | 475 Views
eBank UK : linking research data, scholarly communication and learning. Dr Liz Lyon, UKOLN, University of Bath Dr Simon Coles, School of Chemistry, University of Southampton. Overview. In context: scholarly communications Open Access Data, information, workflows and provenance
E N D
eBank UK : linking research data, scholarly communication and learning. • Dr Liz Lyon, UKOLN, University of Bath • Dr Simon Coles, School of Chemistry, University of Southampton AHM, Nottingham, September 2004
Overview • In context: scholarly communications • Open Access • Data, information, workflows and provenance • The data publication bottleneck • e-Science and crystallography • Comb-e-chem Project • eBank UK • Information architecture and data flow • Interoperability issues • Challenges for the future AHM, Nottingham, September 2004
Current chemistry publishing protocols Ideas and interpretations Hooks into the literature Raw data! Results & derived data AHM, Nottingham, September 2004
The government line “It is envisaged that the sharing of primary data would prevent unnecessary repetition of experiments and enable scientists to build directly on each others’ work, creating greater efficiencies and productivity in the research process.” AHM, Nottingham, September 2004
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Data analysis, transformation, mining, modelling Aggregator services: national, commercial Harvestingmetadata Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Deposit / self-archiving Validation Validation Publication Linking Peer-reviewed publications: journals, conference proceedings Data curation: databases & databanks The scholarly knowledge cycle. Liz Lyon, eBankUK article. Ariadne, July 2003. AHM, Nottingham, September 2004
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Aggregator services: national, commercial Learning object creation, re-use Harvestingmetadata Learning & Teaching workflows Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Deposit / self-archiving Validation Resource discovery, linking, embedding Validation Peer-reviewed publications: journals, conference proceedings Quality assurance bodies AHM, Nottingham, September 2004
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Data analysis, transformation, mining, modelling Aggregator services: national, commercial Learning object creation, re-use Harvestingmetadata Learning & Teaching workflows Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Deposit / self-archiving Deposit / self-archiving Validation Validation Publication Resource discovery, linking, embedding Validation Linking Peer-reviewed publications: journals, conference proceedings Quality assurance bodies Data curation: databases & databanks AHM, Nottingham, September 2004
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Data analysis, transformation, mining, modelling Learning object creation, re-use Aggregator services: eBank UK Harvestingmetadata Learning & Teaching workflows Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Deposit / self-archiving Deposit / self-archiving Validation Validation Publication Resource discovery, linking, embedding Validation Linking Peer-reviewed publications: journals, conference proceedings Quality assurance bodies Data curation: databases & databanks AHM, Nottingham, September 2004
Data Overload! EPSRC National Crystallography Service How do we disseminate? The data deluge AHM, Nottingham, September 2004
CombeChem: An EPSRC pilot project Simulation Video Properties Analysis StructuresDatabase Diffractometer Propertiese-Lab X-Raye-Lab Grid Middleware AHM, Nottingham, September 2004
Virtual Learning Environment Undergraduate Students Digital Library Graduate Students E-Scientists E-Scientists E-Scientists Reprints Grid Peer-Reviewed Journal & Conference Papers Technical Reports LocalWeb Preprints & Metadata Institutional Archive Publisher Holdings 5 E-Experimentation Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning Certified Experimental Results & Analyses Data, Metadata & Ontologies AHM, Nottingham, September 2004
eBank UK project • JISC-funded for 1 year from September 2003 • UKOLN at the University of Bath (lead), University of Southampton, University of Manchester • “Building the links between research data, scholarly communication and learning” • Exemplar: e-Science testbed ‘Combechem’ • Grid-enabled combinatorial chemistry • Crystallography, laser and surface chemistry examples • Development of an e-Lab using pervasive computing technology • National Crystallography Service • Resource Discovery Network / PSIgate physical sciences portal • http://www.ukoln.ac.uk/projects/ebank-uk/ AHM, Nottingham, September 2004
UKOLN Michael Day Monica Duke Rachel Heery Liz Lyon + Andy Powell Southampton Les Carr Simon Coles Jeremy Frey Chris Gutteridge Mike Hursthouse Manchester John Blunden-Ellis The project team AHM, Nottingham, September 2004
First steps: establishing common ground… • Understand the data creation process • Terminology and definitions • Data • Metadata • Datafile • Dataset • Data holding • Different views • Digital library researchers, computer scientists, chemists • Generic vs specific • Modeller vs practitioner • Aim for a common ontology • Modelling the domain • Creating a metadata schema AHM, Nottingham, September 2004
Progress update • Version 2.0 eBank metadata schema • Enhanced ePrints.org software • Pilot institutional e-data repository for harvesting (raw, derived, results data) • Exports records as ebank_dc and oai_dc • Validation of schema • Pilot eBank UK aggregator service • Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal – embedding eBank UK AHM, Nottingham, September 2004
RAW DATA DERIVED DATA RESULTS DATA Crystallography workflow • Initialisation: mount new sample on diffractometer & set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File format) • Report: generate Crystal Structure Report AHM, Nottingham, September 2004
Deposition into the archive AHM, Nottingham, September 2004
An Archive entry For a demo come to the JISC booth! Today @ 13:00 & during tea ecrystals.chem.soton.ac.uk AHM, Nottingham, September 2004
All the way back to the underlying data… AHM, Nottingham, September 2004
Some metadata issues • Using simple and qualified Dublin Core • Additional chemical information in schema for harvesting e.g. empirical formula • Schema contains International Chemical Identifier (InChI) • Links to all datasets associated with an experiment • Links to individual datasets within an experiment • Links to eprints (and other published literature) derived from the data • Using vocabularies specific to crystallography • Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge AHM, Nottingham, September 2004
Dataset Data flow in eBank Dataset Dataset dcterms:references Crystal structure (data holding) Linking ebank_dc record (XML) Deposit dc:type=“CrystalStructure” and/or “Collection” Institutional repository dc:identifier Crystal structure report (HTML) dcterms:isReferencedBy Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Eprint oai_dc record (XML) dc:type=“Eprint” and/or ”Text” Linking Model input Andy Powell, UKOLN. AHM, Nottingham, September 2004
Dataset Data flow in eBank Searching, linking and embedding Dataset Dataset dcterms:references Harvesting OAI-PMH oai_dc Crystal structure (data holding) ePrint UK aggregator service Linking Searching, linking and embedding Harvesting OAI-PMH ebank_dc ebank_dc record (XML) Deposit PSIgate portal dc:type=“CrystalStructure” and/or “Collection” eBank UK aggregator service Institutional repository dc:identifier Crystal structure report (HTML) dcterms:isReferencedBy Harvesting OAI-PMH oai_dc Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Eprint oai_dc record (XML) Subject service dc:type=“Eprint” and/or ”Text” Linking Searching, linking and embedding Model input Andy Powell, UKOLN. AHM, Nottingham, September 2004
Harvesting: OAIster AHM, Nottingham, September 2004
Linking and aggregating: Search & discover For a demo come to the JISC booth! Today @ 13:00 & during tea or the buffet AHM, Nottingham, September 2004
Linking and aggregating: Hit browsing AHM, Nottingham, September 2004
And finally…eBank embedded in a science portal AHM, Nottingham, September 2004
Currently we are…… • Assessing outcomes of a Consultation Workshop held in August e.g. • Cost-benefit issues for researchers? • RAE / assessment impact? • Disciplinary differences? • Presenting a demonstrator • Completing supporting studies on (1) Provenance and (2) Data models and schema • Promoting Open Access and Open eData Archives to international crystallographic organisations, publishers, learned societies • Phase 2 proposal funding sought for further 12 months AHM, Nottingham, September 2004
Phase 2 plan…….(1) • Continue to progress towards generic metadata schemas • Validation against other schema • CLRC Scientific Metadata Model • Modify Eprints.org software to allow for more generic scientific data and schemas • Metadata enhancement: subject keyword additions based on knowledge of keywords in related publications • Investigate identifiers e.g. International Chemical Identifier (InChI code) • Explore context sensitive linking: find me • Datasets by this person; Journal articles by this person; Datasets related to this subject; Journal articleson this subject; Learning objects by this person; Learning objects on this subject AHM, Nottingham, September 2004
Phase 2…….(2) • Full embedding into the crystallographic research and publishing communities • Chemistry workflow embedding • SMART TEA e synthesis Lab • Other analytical techniques in chemistry • e-Learning embedding and pedagogic evaluation • Undergraduate chemical informatics courses • Introduction to visiting schools • Expand into other physical, mathematical, geological and engineering sciences • Feasibility study in related domains – bio and medical sciences • Feasibility study in unrelated domains – arts and humanities AHM, Nottingham, September 2004