1 / 23

Biomedical Informatics Group (UPM)

Biomedical Informatics Group (UPM). WP4: Data Interoperability and Management. David Pérez-Rey – UPM Miguel García-Remesal – UPM. Agenda. SoA Ontology-based Data Integration INFORMA pilot. SoA. Contents Data characteristics and ontologies Integration approaches

luann
Download Presentation

Biomedical Informatics Group (UPM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biomedical Informatics Group(UPM) WP4: Data Interoperability and Management David Pérez-Rey – UPMMiguel García-Remesal – UPM

  2. Agenda • SoA • Ontology-based Data Integration • INFORMA pilot

  3. SoA • Contents • Data characteristics and ontologies • Integration approaches • Ethics & Confidentiality issues • Support for pilots

  4. Ontology-based Data Integration • Schema-level integration • ONTOFUSION update for INFOBIOMED (Web Services and OWL) • Instance-level integration • ONTODATACLEAN for INFOBIOMED • Combination into a single system

  5. Schema-level Integration - ONTOFUSION • Each information source is represented by a “virtual schema” – that is an ontology describing the conceptual structure of the information • “Virtual schemas” are obtained from a mapping process between the physical structure and the ontology: • Top-Down methodology: Domain ontology already created • Bottom-Up methodology: Creating a new domain ontology • Hybrid methodology

  6. USER Unified Virtual Schemas SEARCH Unification Virtual Schemas as Ontologies Mapping Schema-level Integration SNP SNP_Code Id_SNP Physical DBs Local Data SNPs

  7. Instance-level Integration - ONTODATACLEAN • A Ontology as a framework to identify inconsistencies • Terminology • Scale • Format • Patterns • Missing Values • … • Afterwards automatic preprocessing

  8. Instance-level Integration DB1 DB2 Transformation C>T 12 ‘C’->’1’ ‘>’->’’ ‘T’->’2’ 1.0 100 DB1 x 100 Fever High temperature High temperature ->fever Male 1 Male -> 1 … … … 16/11/05 16-11-2005 …

  9. System Architecture Web Services Platform VS Service Web Client Web Server HTTP VS Service User Service Results VS Service Instance Homogenization

  10. Experiments • Testing with Public Databases • Reactome • Gepas – Fibroblast • BioMérieux • Contents of the databases can be downloaded

  11. Reactome • A knowledge base of biological pathways • Terminological inconsistencies (UMLS) and missing values http://www.reactome.org

  12. GEPAS - Fibroblast • The Gene Expression Pattern Analysis Suite • Integrated web-based pipeline for the analysis of gene expression patterns • Scale Transformations http://www.gepas.org

  13. BioMérieux • Biochemical characterization of bacteriological agents • Pattern transformation http://www.biomerieux.com

  14. Data Mining Experiments • Public data sets for data mining • Preprocessing ontology development • Result comparison after preprocessing

  15. Data Mining Experiments • Breast cancer • Clinic data (Ljubljana Oncology Institute, Wisconsin, others) • Gene expression (Duke University and Kent Ridge Biomedical Data Set) • Thyroid – hyper e hypothyroidism • 6 databases from the Garavan Institute (Sydney) • Others

  16. INFORMA Pilot • Document sent to the consortium by INFORMA • Subtopic 1: HIV subtyping and URF repository • Subtopic 2: HIV in vitro drug susceptibility predictor • Subtopic 3: HIV treatment response repository • Subtopic 4: HIV treatment database integration

  17. Hospital 1 Hospital 4 Hospital n Hospital 2 Hospital 3 Hospital 3 BD BD BD Hospital 2 Hospital 4 Centralized Repository BD BD Hospital 1 Hospital n Integration Approaches Centralized vs Distributed

  18. INFORMA Pilot • Objectives • Develop a web-based tool to facilitate export or access of data between a user’s database and the internal database used to implement the functions available on the HIV pol analysis portal • Define a standard (possibly based on ARCA) • Type of data to be handled • HIV pol sequences likely to be recombinants • HIV pol sequences matched with in vitro drug susceptibility • HIV pol sequences coupled with treatment used and follow-up data

  19. INFORMA Pilot • Pilot Challenges • Heterogeneity of data sources • Different schemas, technology… • Heterogeneous data • Need to maintain local autonomy and preferences • Ethical and security issues - Custodix • Privacy, security • Anonymization of sensitive data • Features - Semi-automatically handled: • Heterogeneity conflicts • Semantic conflicts • Descriptive conflicts • Structural conflicts • Implementation status • To be discussed and defined in the Madrid meeting (end of May)

  20. Future actions • Deliverable D25 – “First report on Data Interoperability and Management” – Month 39 • INFORMA Mini-Pilot meeting (End of May in Madrid) – Other partners are welcome • Other collaborations

More Related