1.21k likes | 1.37k Views
Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next…. Prof. Jessie Kennedy. Science & Scientific Data. Science and Scientific Data are Complex…. Climatology. Hydrology. Meteorology. Geography. Oceanography. Geology. Ecology. Paleontology.
E N D
Exploiting Diverse Sources of Scientific Data the vision, what has been achieved and what next… Prof. Jessie Kennedy
Science & Scientific Data • Science and Scientific Data are Complex… Exploiting Diverse Sources of Scientific Data
Climatology Hydrology Meteorology Geography Oceanography Geology Ecology Paleontology Genomics Taxonomy Proteomics Morphology Nomenclature Biochemistry
Climatology Hydrology Meteorology Geography Oceanography Temperature Geology Depth Location Organism Ecology Paleontology Taxon concept Gene sequence Genomics Taxonomy Proteomics Name Protein Morphology Nomenclature Pathway Biochemistry
Scientific Community: complex Small Scientific Community Individual Scientist Large Scientific Community Scientific Laboraotory Exploiting Diverse Sources of Scientific Data
Climatology Climatology Climatology Climatology Hydrology Hydrology Hydrology Hydrology Meteorology Meteorology Meteorology Meteorology Geography Geography Geography Geography Oceanography Oceanography Oceanography Oceanography Temperature Temperature Temperature Temperature Geology Geology Geology Geology Depth Depth Depth Depth Location Location Location Location Organism Organism Organism Organism Ecology Ecology Ecology Ecology Paleontology Paleontology Paleontology Paleontology Taxon concept Taxon concept Taxon concept Taxon concept Gene sequence Gene sequence Gene sequence Gene sequence Genomics Genomics Genomics Genomics Taxonomy Taxonomy Taxonomy Taxonomy Proteomics Proteomics Proteomics Proteomics Name Name Name Name Protein Protein Protein Protein Morphology Morphology Morphology Morphology Nomenclature Nomenclature Nomenclature Nomenclature Pathway Pathway Pathway Pathway Biochemistry Biochemistry Biochemistry Biochemistry
conclusion observation experiment hypothesis Science & Scientific Data • Are continually changing • Conclusions become foundations for new hypotheses • New experiments invalidate existing knowledge • Knowledge is open to interpretation • Different opinions • World continually changing Exploiting Diverse Sources of Scientific Data
Exploiting Diverse Sources of Scientific Data: the vision • To provide scientists with technological solutions to exploit the wealth and diversity of Scientific Data • Discovery • Access • Sharing • Integration/Linking • Analysis • Which would thereby improve the potential for new scientific discovery Exploiting Diverse Sources of Scientific Data
ESG Projects in most sciences: Exploiting Diverse Sources of Scientific Data
SEEK (Scientific Environment for Ecological Knowledge): Vision • Research, develop, and capitalize upon advances in information technology to radically improve the type and scale of ecological science that can be addressed • Scalable synthesis Michener
Data Dispersion Challenges • Data are massively dispersed • Ecological field stations and research centers (100’s) • Natural history museums and biocollection facilities (100’s) • Agency data collections (10’s to 100’s) • Individual scientists (1000’s) • Maintenance must be local Michener
Data Integration Challenges • Data are heterogeneous • Syntax • (format) • Schema • (model) • Semantics • (meaning) Jones
Ecological Modeling Challenges • Analysis and modeling tools are: • Specialized • Disconnected • Proprietary • It is: • Difficult to revise analyses • Hard to document analyses • Impossible to reliably publish models to share with colleagues • Hard to re-use models and analyses from colleagues • Difficult to use grid-computing for demanding computations • Labor-intensive to manage data in popular analysis software Michener
Exploiting Diverse Sources of Scientific Data: the approaches • Data Discovery/Access • Metadata • To describe the data sets • Ontologies • To define the terminology used • Standardisation of formats • For the exchange of data • Life Science Identifiers (LSIDs) • To uniquely identify and resolve data objects • Provenance of data • To record where the data has come from • And what has happened to it en route. • GRID/Web technology • Distributed data management Exploiting Diverse Sources of Scientific Data
Exploiting Diverse Sources of Scientific Data: the approaches • Data Integration/Linking • Metadata • To know how to interpret the data sets • Ontologies • To know how data in the data sets might be related • To aid automatic transformation of the data • Standardisation of formats • To ease integration • Life Science Identifiers (LSIDs) • To know when 2 things are the same • Workflows • To enable refinement and repetition of integration Exploiting Diverse Sources of Scientific Data
Exploiting Diverse Sources of Scientific Data: the approaches • Data Analysis • Metadata • To know how to interpret the data sets • Ontologies • To know analytical/transformation processes appropriate • Workflow Tools • To ease analytical processes • Recording/reuse of analytical processes • Provenance • Recording life history of data • To enable validation Exploiting Diverse Sources of Scientific Data
Exploiting Diverse Sources of Scientific Data: the technologies • Standardisation of formats • Metadata • Ontologies • Life Science Identifiers (LSIDs) • Provenance • Workflow Tools • GRID/Web technology Exploiting Diverse Sources of Scientific Data
Exploiting Diverse Sources of Scientific Data: the technologies • Standardisation of formats • Metadata • Ontologies • Life Science Identifiers (LSIDs) • Provenance • Workflow Tools • GRID/Web technology Exploiting Diverse Sources of Scientific Data
Meta Data: the vision • Meta data - "data about data" • keywords, title, creator …. • If scientists marked up their data with the agreed meta data it would be trivial to find highly relevant data (sub-)sets for analysis… • Meta-utopia…. Exploiting Diverse Sources of Scientific Data
Meta-utopia • A world of complete, reliable metadata. • In meta-utopia, • Everyone uses the same language • and means the same thing… • The guardians of epistemology have rationally mapped out a schema or hierarchy of ideas. • that everyone adheres to… • Scientists accurately describe their methods, processes and results. • so anyone can do anything with it in the future… Cory Doctorow Exploiting Diverse Sources of Scientific Data
Meta Data: the approach • Common language • XML Schemas to describe data/meta data • Domain specific exchange schemas • Explosion of these in every domain • Exchanging data • Archiving data Exploiting Diverse Sources of Scientific Data
Ecological Metadata Language A look inside the meta-utopia of ecology
Discovery: coverage elements Geographic Temporal Taxonomic
Meta Data: the approach • Common language • XML Schemas to describe data/meta data • Domain specific exchange schemas • Explosion of these in every domain • Exchanging data • Archiving data • Turned into extensive specifications • Difficult to know where to stop… Exploiting Diverse Sources of Scientific Data
but even this wasn’t enough….. • It’s not good enough to have meta-data, we need to know what the terms in the meta-data (schema or data values) mean. Exploiting Diverse Sources of Scientific Data
Ontologies – the vision • If we understood the meaning of the schema and the terms used in the meta-data or databases we would be able to: • find things more reliably, • integrate things more easily, • reason about what things are comparable…. • because we have support for automatic inference Exploiting Diverse Sources of Scientific Data
Ontologies – the approach • Common Language… • OWL? • RDF, OWL lite, OWL DL, OWL full….. • Domain specific ontologies • or project specific? • Map different ontologies • Modularise the ontologies • Reuse.. • Build upper ontologies to which domain ontologies extend/link Exploiting Diverse Sources of Scientific Data
BDI Core BioObservation Similar to…
SEEK Observation ontology Josh Madin
entity An extension point for domain-specific terms Josh Madin