180 likes | 304 Views
RDFizing the EBI Gene Expression Atlas. James Malone, Electra Tapanari malone@ebi.ac.uk. Motivation. Initial motivation is explorative Can we ask new questions? Do we get new answers? Can we integrate this data with other related data?
E N D
RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk Master headline
Motivation • Initial motivation is explorative • Can we ask new questions? • Do we get new answers? • Can we integrate this data with other related data? • Is there a sufficient user community to justify an RDF Atlas resource? Master headline
SESL Project • Semantic Enrichment of Scientific Literature Working Group • Includes EBI (Dietrich Rebholz) and Pistoia Alliance • Pilot project in 2010 looking at Developing knowledge brokering standards for semantic integration of gene to Type II diabetes data using Gene Expression Atlas, OMIM, UniProt literature Master headline
Gene Expression: Archive to Atlas ArrayExpress Curation Curation AE/GEO acquire Re-annotate & summarize ATLAS >250,000 Assays >10,000 experiments Master headline
Experimental Factor Ontology We consume parts of reference ontologies from domain Construct new classes and relations to answer our use cases Aim is reuse of existing resources, shared frameworks and mapping of equivalencies where they exist Chemical Entities of Biological Interest (ChEBI) Relation Ontology Ontology Biomedical Investigations Text mining Various Species Anatomy Ontologies Anatomy Reference Ontology Disease Ontology EFO 5 10/4/2014 Master headline
Gene Expression Atlas @ www.ebi.ac.uk/gxa Query for Cell adhesion genes in all ‘organism parts’ ‘View on EFO’ Master headline Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk
Input XML Master headline
Mapping XML Results to RDF (1) • Gene to related transcripts, sequence and gene functions • Also EFO ontology classes in RDF form (shown is label to IRI triple) Id here is an ENSEMBL Gene ID, e.g. RUNX1 (ENSG00000159216) Master headline
Mapping XML Results to RDF (2) • Connecting gene and ontology id together with experimental metrics Master headline
Mapping XML Results to RDF (3) • Connecting gene with experimental metadata Master headline
Relationship Issues • EFO attempts to follow OBO Foundry guidance and uses the OBO Relation Ontology • OBI model is more complex, e.g. the relation between sample and measure is indirect* • Relationship between some of entities is still not well represented across community, even protein product to gene (see my post to OBO list) • is_about relation is very generic and largely meaningless • We will use RO where possible, subclass RO otherwise and continue to monitor OBO *see Brinkman et al, (2010) Modeling biomedical experimental processes with OBI, JBMS, 1(Suppl 1):S7 Master headline
Display of query results in Gene Expression Atlas DB • Already: • 1) JSON format • 2) XML format • Plus now: • 3) RDF format Master headline
RDF pipeline • Pipeline for generating the RDF given the XML input • note this works with any XML code INPUT OUTPUT PROCESS XML result doc from Atlas Java code RDF triples XML doc XML doc with triple patterns Master headline
Triple Pattern specification Master headline
Example RDF Master headline
First row (n1_0 ) 7 triples Blank Node Connections Master headline
Discussion • Is there a community that warrants directing resources towards this? • Can we answer new questions? • Can we integrate with other data sources? • Can we consolidate complex, non-interoperable ontologies? • EFO represents a view on this but is a scoped, pragmatic choice – will this indeed always be the case? Master headline
Acknowledgements • Electra Tapanari (intern that did bulk of implementation) • Dietrich Rebholz-Schumann (funding internship) • Christoph Grabmuller • Misha Kapushesky • Helen Parkinson • Contact me James Malone: malone@ebi.ac.uk Master headline