1 / 120

Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next…

Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next…. Prof. Jessie Kennedy. Science & Scientific Data. Science and Scientific Data are Complex…. Climatology. Hydrology. Meteorology. Geography. Oceanography. Geology. Ecology. Paleontology.

mira-travis
Download Presentation

Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next… Prof. Jessie Kennedy

  2. Science & Scientific Data • Science and Scientific Data are Complex… Exploiting Diverse Sources of Scientific Data

  3. Climatology Hydrology Meteorology Geography Oceanography Geology Ecology Paleontology Genomics Taxonomy Proteomics Morphology Nomenclature Biochemistry

  4. Climatology Hydrology Meteorology Geography Oceanography Temperature Geology Depth Location Organism Ecology Paleontology Taxon concept Gene sequence Genomics Taxonomy Proteomics Name Protein Morphology Nomenclature Pathway Biochemistry

  5. Scientific Community: complex Small Scientific Community Individual Scientist Large Scientific Community Scientific Laboraotory Exploiting Diverse Sources of Scientific Data

  6. Climatology Climatology Climatology Climatology Hydrology Hydrology Hydrology Hydrology Meteorology Meteorology Meteorology Meteorology Geography Geography Geography Geography Oceanography Oceanography Oceanography Oceanography Temperature Temperature Temperature Temperature Geology Geology Geology Geology Depth Depth Depth Depth Location Location Location Location Organism Organism Organism Organism Ecology Ecology Ecology Ecology Paleontology Paleontology Paleontology Paleontology Taxon concept Taxon concept Taxon concept Taxon concept Gene sequence Gene sequence Gene sequence Gene sequence Genomics Genomics Genomics Genomics Taxonomy Taxonomy Taxonomy Taxonomy Proteomics Proteomics Proteomics Proteomics Name Name Name Name Protein Protein Protein Protein Morphology Morphology Morphology Morphology Nomenclature Nomenclature Nomenclature Nomenclature Pathway Pathway Pathway Pathway Biochemistry Biochemistry Biochemistry Biochemistry

  7. conclusion observation experiment hypothesis Science & Scientific Data • Are continually changing • Conclusions become foundations for new hypotheses • New experiments invalidate existing knowledge • Knowledge is open to interpretation • Different opinions • World continually changing Exploiting Diverse Sources of Scientific Data

  8. Exploiting Diverse Sources of Scientific Data: the vision • To provide scientists with technological solutions to exploit the wealth and diversity of Scientific Data • Discovery • Access • Sharing • Integration/Linking • Analysis • Which would thereby improve the potential for new scientific discovery Exploiting Diverse Sources of Scientific Data

  9. ESG Projects in most sciences: Exploiting Diverse Sources of Scientific Data

  10. SEEK (Scientific Environment for Ecological Knowledge): Vision • Research, develop, and capitalize upon advances in information technology to radically improve the type and scale of ecological science that can be addressed • Scalable synthesis Michener

  11. Data Dispersion Challenges • Data are massively dispersed • Ecological field stations and research centers (100’s) • Natural history museums and biocollection facilities (100’s) • Agency data collections (10’s to 100’s) • Individual scientists (1000’s) • Maintenance must be local Michener

  12. Data Integration Challenges • Data are heterogeneous • Syntax • (format) • Schema • (model) • Semantics • (meaning) Jones

  13. Ecological Modeling Challenges • Analysis and modeling tools are: • Specialized • Disconnected • Proprietary • It is: • Difficult to revise analyses • Hard to document analyses • Impossible to reliably publish models to share with colleagues • Hard to re-use models and analyses from colleagues • Difficult to use grid-computing for demanding computations • Labor-intensive to manage data in popular analysis software Michener

  14. Exploiting Diverse Sources of Scientific Data: the approaches • Data Discovery/Access • Metadata • To describe the data sets • Ontologies • To define the terminology used • Standardisation of formats • For the exchange of data • Life Science Identifiers (LSIDs) • To uniquely identify and resolve data objects • Provenance of data • To record where the data has come from • And what has happened to it en route. • GRID/Web technology • Distributed data management Exploiting Diverse Sources of Scientific Data

  15. Exploiting Diverse Sources of Scientific Data: the approaches • Data Integration/Linking • Metadata • To know how to interpret the data sets • Ontologies • To know how data in the data sets might be related • To aid automatic transformation of the data • Standardisation of formats • To ease integration • Life Science Identifiers (LSIDs) • To know when 2 things are the same • Workflows • To enable refinement and repetition of integration Exploiting Diverse Sources of Scientific Data

  16. Exploiting Diverse Sources of Scientific Data: the approaches • Data Analysis • Metadata • To know how to interpret the data sets • Ontologies • To know analytical/transformation processes appropriate • Workflow Tools • To ease analytical processes • Recording/reuse of analytical processes • Provenance • Recording life history of data • To enable validation Exploiting Diverse Sources of Scientific Data

  17. Exploiting Diverse Sources of Scientific Data: the technologies • Standardisation of formats • Metadata • Ontologies • Life Science Identifiers (LSIDs) • Provenance • Workflow Tools • GRID/Web technology Exploiting Diverse Sources of Scientific Data

  18. Exploiting Diverse Sources of Scientific Data: the technologies • Standardisation of formats • Metadata • Ontologies • Life Science Identifiers (LSIDs) • Provenance • Workflow Tools • GRID/Web technology Exploiting Diverse Sources of Scientific Data

  19. Meta Data: the vision • Meta data - "data about data" • keywords, title, creator …. • If scientists marked up their data with the agreed meta data it would be trivial to find highly relevant data (sub-)sets for analysis… • Meta-utopia…. Exploiting Diverse Sources of Scientific Data

  20. Meta-utopia • A world of complete, reliable metadata. • In meta-utopia, • Everyone uses the same language • and means the same thing… • The guardians of epistemology have rationally mapped out a schema or hierarchy of ideas. • that everyone adheres to… • Scientists accurately describe their methods, processes and results. • so anyone can do anything with it in the future… Cory Doctorow Exploiting Diverse Sources of Scientific Data

  21. Meta Data: the approach • Common language • XML Schemas to describe data/meta data • Domain specific exchange schemas • Explosion of these in every domain • Exchanging data • Archiving data Exploiting Diverse Sources of Scientific Data

  22. Ecological Metadata Language A look inside the meta-utopia of ecology

  23. Identification: dataset elements

  24. Identification: resource elements

  25. Identification: party elements

  26. Discovery: coverage elements Geographic Temporal Taxonomic

  27. Evaluation Level Information

  28. Evaluation: Method Information

  29. Evaluation: Project Information L3

  30. Access: Permissions Information L4

  31. Access: Physical Information

  32. Access: Physical formatting details

  33. Access: Distribution Information L4

  34. Integration Level Information

  35. Integration Level: Attribute structure

  36. Integration Level: attribute domains

  37. Integration Level: attribute domains

  38. Integration Level: measurementScale

  39. Meta Data: the approach • Common language • XML Schemas to describe data/meta data • Domain specific exchange schemas • Explosion of these in every domain • Exchanging data • Archiving data • Turned into extensive specifications • Difficult to know where to stop… Exploiting Diverse Sources of Scientific Data

  40. but even this wasn’t enough….. • It’s not good enough to have meta-data, we need to know what the terms in the meta-data (schema or data values) mean. Exploiting Diverse Sources of Scientific Data

  41. Ontologies – the vision • If we understood the meaning of the schema and the terms used in the meta-data or databases we would be able to: • find things more reliably, • integrate things more easily, • reason about what things are comparable…. • because we have support for automatic inference Exploiting Diverse Sources of Scientific Data

  42. Ontologies – the approach • Common Language… • OWL? • RDF, OWL lite, OWL DL, OWL full….. • Domain specific ontologies • or project specific? • Map different ontologies • Modularise the ontologies • Reuse.. • Build upper ontologies to which domain ontologies extend/link Exploiting Diverse Sources of Scientific Data

  43. Biodiversity Base Ontology

  44. Core Layer

  45. BDI Core Taxon Name

  46. BDI Core Taxon Concept

  47. BDI Core BioSpecimen

  48. BDI Core BioObservation Similar to…

  49. SEEK Observation ontology Josh Madin

  50. entity An extension point for domain-specific terms Josh Madin

More Related