1 / 12

Protein Data Integration through Ontologies

Protein Data Integration through Ontologies. Digital Ecosystems & Business Intelligence Institute, Curtin University of Technology, Perth, Australia http://www.debii.curtin.edu.au/. Outline. Existing Interoperability of Biological Data Inconsistency in Protein Data Sources

pennie
Download Presentation

Protein Data Integration through Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Data Integration through Ontologies Digital Ecosystems & Business Intelligence Institute, Curtin University of Technology, Perth, Australia http://www.debii.curtin.edu.au/ Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  2. Outline • Existing Interoperability of Biological Data • Inconsistency in Protein Data Sources • Need for Protein Ontology • Protein Ontology Project • Protein Ontology (PO) • PO Algebra • PO Instance Store • Selected References Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  3. Existing Interoperability of Biological Data • Biological data must be described in context rather than in isolation. • Databases provide multiple links to other resources, but efficient use of these links requires intelligent retrieval systems. • Attempts have been made to create interdatabase links automatically, restricted to few selected data resources, and with limited accuracy. • An alternative approach is the concept of a warehouse, or a centralized data resource that manages a variety of data collections translated into a common format. • The recently emerged ‘middleware’ approach affords a chance to uncouple data access from data management and to allow for remote retrieval beyond the simple scripts fetching data from external databases. Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  4. Inconsistency in Protein Data Sources • Problem of Synonyms:In many cases, creators use different data descriptors to refer to the same real-world protein data. • Difference of Scope: Often, the authors of protein data sources use the same term to denote multiple meanings. Even if not entirely different, the scope of the intended meaning of a term differs. Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  5. Need for Protein Ontology • We need to develop: • Representation of the semantics of the protein information that is shared and can be used as the basis for interoperability between heterogeneous protein databases. • Query methodology to allow this semantic representation to be used for querying heterogeneous databases. Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  6. Protein Ontology Project • The scope of Protein Ontology Project can succinctly be described by the following components: • Develop a generic methodology for the design of ontology for integration of protein data and information sources. • Develop an ontological model for the representation the data and knowledge regarding proteins. • Develop a query algebra based on the developed ontological model for the purpose of intelligent and dynamic information retrieval from protein data sources. • Evaluate the developed protein ontology framework using data analysis techniques to prove the strengths of the approach. Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  7. Protein Ontology (PO) • We are building Protein Ontology to integrate protein data formats and provide a structured and unified vocabulary to represent protein synthesis concepts. • PO consists of concepts, which are data descriptors for proteomics data and the relationships among these concepts. • PO has: • a hierarchical classification of concepts represented as classes, from general to specific; • a list of attributes related to each concept, for each class; • a set of relationships between classes to link concepts in ontology in more complicated ways then implied by the hierarchy, to promote reuse of concepts in the ontology; and • a set of algebraic operators for querying protein ontology instances. • More details about Protein Ontology are at: http://www.proteinontology.info/ Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  8. Protein Ontology (PO) Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  9. PO Algebra We defined Rules that allows composition of multiple levels of information stored in the ontology for information retrieval (referred to as PO Algebra) • Unary Operator: SELECT • Binary Operator: UNION, INTERSECTION, DIFFERENCE Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  10. PO Instance Store • Stores Protein Data as OWL files. • At the moment contains instances of 7424 proteins families • http://www.proteinontology.info/proteins.php • We did some preliminary investigation on Prion dataset of PO using standard hierarchical mining algorithms (Tan et al., 2006): • Our Group’s Work : MB3-Miner, X3-Miner, IMB3-Miner • Other Works: VTreeMiner, PatternMacther, FREQT Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  11. Mining PO Instance Store Digital Ecosystems & Business Intelligence Institute, Perth, Australia

  12. Selected References • Protein Ontology • Sidhu, A.S., Dillon, T.S. and Chang, E. (2007) Protein Ontology. In Chen, J. and Sidhu, A.S. (eds), Biological Database Modeling. Artech House, New York, 63-80. • SIDHU, A. S., DILLON, T. S. & CHANG, E. (2005) An Ontology for Protein Data Models. IN ZHANG, Y. T., ROUX, C. & ZHUANG, T. G. (Eds.) 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2005). Shanghai, IEEE Engineering in Medicine and Biology Society. • PO Algebra • SIDHU, A. S., DILLON, T. S. & CHANG, E. (2006) Towards Semantic Interoperability of Protein Data Sources. 2nd IFIP WG 2.12 & WG 12.4 International Workshop on Web Semantics (SWWS 2006) in conjunction with OTM 2006. France, Springer-Verlag • Mining PO Instance Store • HADZIC, F., DILLON, T. S., SIDHU, A. S., CHANG, E. & TAN, H. (2006) Mining Substructures in Protein Data. 2006 IEEE Workshop on Data Mining in Bioinformatics (DMB 2006) in conjunction with 6th IEEE ICDM 2006. Hong Kong, IEEE Computer Society. Digital Ecosystems & Business Intelligence Institute, Perth, Australia

More Related