160 likes | 271 Views
Requirements for Semantic Biobanks. André Q ANDRADE a,b , , Markus KREUZTHALER b , Janna HASTINGS d,e , Maria KRESTYANINOVA f,g , Stefan SCHULZ b,c. a School of Information Science, Federal University of Minas Gerais , Brazil
E N D
Requirements for Semantic Biobanks André Q ANDRADEa,b,, Markus KREUZTHALERb, Janna HASTINGSd,e , Maria KRESTYANINOVAf,g , Stefan SCHULZb,c aSchool of Information Science, Federal University of Minas Gerais, Brazil bMedical University of Graz, Austria, cUniversity Medical Center Freiburg, Germany dEuropean Bioinformatics Institute, Hinxton, UK;eUniversity of Geneva, Switzerland fHelsinki University, Finland, gUniquer, Lausanne, Switzerland
Semantic Biobanks • Semantic interoperability: systems exchange exchange data + meaning • Formal Ontologies provide unambiguous descriptions of what is universally true for all objects of a certain type • Increasing number of biomedical vocabularies are ontology based (OBO Foundry, SNOMED CT…) • Blood, tissue sampling for research • Samples from several biobanks needed for retrieving data for a specific research question • Comprehensive annotations with lab data and clinical data Model of Meaning Data
(Generalized) Biomedical Retrieval Scenario • Retrieval: • Distribution of heterogeneous resources of interest • Most retrieval scenarios recall-oriented • Resources used by multiple researchers over the world for multiple purposes • Effective retrieval depends on querying resource metadata • Provenance information • Content-based semantic annotations (structured vocabulary) • Access regulations Does this sound familiar?
Analogy • Global bibliographic database • Resources: publications from different publishers • Annotations: • Bibliographic data • Abstract • Semantic representation (MeSH) on paper content • Local access conditions to the full resource apply
Analogy Biobank“Broker” • Global bibliographic database • Resources: publications from different publishers • Annotations: • Bibliographic data • Abstract • Semantic representation (MeSH) of paper content • Local access conditions to the full resource apply • Global biobank sample database • Resources: biological specimens (blood, tissue,…) • Annotations: • Sample information (staining etc…) • Semantic representation of both lab and selected patient related information(Information models / ontologies) • Local access conditions to the full resource apply
Data resources for biobanking • Sample related information: • Type of sample • Preparation of sample • Time • Storage information • Physical location • Associated information, lab data, genotype,… • Donor related information: • Demographic data • Phenotype data • Time indexed clinical data (EHR extracts) • Increment of relevant donor related information after samples are taken 1960 1970 1980 1990 2000 2010
Centralized broker for biobanking information + + Biobank EHR Biobank + EHR * + * + * + * + Biobank EHR + Biobank EHR
Centralized broker for biobanking information + + Biobank EHR Biobank + EHR * + * + * + * + Biobank EHR + Biobank EHR
Centralized broker for biobanking information + + Biobank EHR Biobank + EHR * + * + * + * + Biobank EHR + Biobank EHR
Language for semantic annotations of biobank data • Formal ontologies • Precise, logical descriptions of annotations and queries • High expressiveness through compositionality • OWL-DL: Semantic Web Standard for description logics: allows to formulate axioms of what is universally true of all instances of a kind • Specific components • Ground axioms provided by an upper level ontology (BioTop) • Set of disjoint upper level categories and relations, together with related constraints • Ontological description of domain: SNOMED CT, OBO Foundry…
Description logics representation and retrieval • “retrieve all gastric mucosa samples from before 2003 of patients who had cancer of stomach after 2008” • Representation language: OWL DL • Editor: Protégé 4.2. • Reasoner: HermiT retrieves
Requirements • Formal representations • Ontological representation of information models and terminologies • Ontological representation of data about specimens • Joint, universally used clinical terminology • Expressive and stable upper level ontologies (+ ontological relations) • Scope and granularity of EHR extract of interest for biobank related queries • Specification of structure and function of central repository • Steps for information translation from legacy systems • Mappings • Interfaces • Update policies
Challenges • Prototypical status of DL reasoners and editor • Performance problems with expressive ontologies • Modularization of large clinical terminologies in response to data and query under scrutiny • Organization of • Central repository • Local mappings / translations • Logistics (samples) • Privacy and IP issues • Business model
Thanks Andrade et al.: Requirements for Semantic Biobanks • CAPES (Brazil) – Programa de Doutoradono País com Estágio no Exterior • FP7 – NoE SemanticHealthNet