210 likes | 328 Views
An Ontological Approach for Describing Phospho-proteins in Rhodococcus. Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang CPSC 445. April 5 th . 2007. What is an ontology?. Purpose: knowledge representation & reasoning
E N D
An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang CPSC 445. April 5th. 2007
What is an ontology? • Purpose: • knowledge representation & reasoning • Facilitates knowledge sharing and reuse • Definition: • a data model that represents a set of concepts within a domain and the relationships between those concepts. • It is used to reason about the objects within that domain. • Describe individuals (instances), classes (concepts), attributes, relations and axioms • Uses: • AI, information architecture, semantic web, software engineer
Problems in biology • Biology = knowledge based • use prior knowledge to infer new knowledge • data rich • Biologist needs extensive prior knowledge to analyze data obtained • Pace of data production beyond one’s ability to acquire knowledge • Need an automated system to apply domain experts’ knowledge to biological data
Solution: ontology & bioinformatics • Joint effort of biologist and computer scientist • Build ontologies using domain knowledge • Rapid classification of large datasets • Allows query to find instances of a class • Create controlled vocabularies for shared use across different biological and medical domains. • In bioinformatics, ontology can make knowledge available to community and its applications.
Example: Gene Ontology (GO) “provides structured, controlled vocabularies and classifications that cover several domains of molecular biology” • Uses: • annotation of large data sets • the ability to group gene products to some high level term • Computational (putative) assignments of molecular function based on sequence similarity to annotated genes or sequences. ? Inferred gene function from electronic annotation Unknown gene product Seqsimilarity Infer function Sequence in SWISS-PROT Known function
How are ontologies built? • There is no standardized methodology • But, efforts to make more comprehensive guidelines • In general: • Informal Stage • natural language • Formal Stage • formal knowledge representation language
Ontology-building life cycle Inspired by software engineering. User Model(Biologist): #1) Identification of the purpose and scope of the ontology #2) Acquisition of domain knowledge Identify purpose and scope Knowledge Acquisition
Ontology-building life cycle Conceptualization Model (Bioinformatician/Biologist): #3) Identifying key concepts in the domain. #4) Integration by using and incorporating other existing ontologies Identify purpose and scope Knowledge Acquisition Building Conceptualization Integrating existing ontologies
Ontology-building life cycle Implementation Model (Bioinformatician): #5) Representing concepts with a formal language #6) Documenting informal and formal definitions #7) Evaluation of the appropriateness of the ontology for its intended application Identify purpose and scope Available Development Tools Knowledge Acquisition Language & Representation Building Conceptualization Integrating existing ontologies Encoding Evaluation
Describing Phospho-Protein using Phosphabase Ontology Biologists Signal Protein Experts Provides Provides Proteomic experimental data Phosphatase & Kinase backgroundknowledge Uses Bioinformatician Made up of Build using OWL-DL Data (Instances/Individuals) Ontology (Classes) Results Pellet Reasoner • Can we use the phosphabase ontology to describe phospho-proteins discovered by the Rhodococcus Genome Project?
Web Ontology Language (OWL) Class Professor subClassOf Superclass FacultyMember InstanceOf Individual Jennifer Chen Individual Anne Condon teaches • XML syntax • OWL-DL (Description Logic) : Certain restrictions to guarantee decidability based on description logic • OWL uses Resource Description Framework (RDF) • Subject Predicate Object • Basic components in OWL: • classes • Individuals • properties
Phosphobase Ontology • Wolstencroft et al, 2006 • Biological Motivation • Driven by protein domain architecture to describe signalling protein families • Background knowledge required for construction: • Signal protein domains • Presence of protein domains within signal proteins • OWL Ontology • Ontology uses OWL-DL • Description-logic can be applied to classify proteins using reasoners • Many different ways to represent this knowledge in OWL
Phosphabase.owl Domain_Entity Macromolecule Protein_Phosphatase Protein_Kinase
OWL DL Reasoners: Pellet • Input • Ontology – OWL-DL format • axioms about classes into TBox • type and property assertions (individuals) into ABox • Query - RDQL (SPARQL) format • Instance data (individuals) • Tableau Reasoner • Checks satisfiability of an ABox with respect to a TBox • Test for knowledge base consistency [Parsia and Sirin, ISWC 2004]
Instance Data No Result
Conclusions • Ontologies can be used as a standard model for the exchange of biological information • Building ontologies can get very complicated • Biologists with little description logic training • Computer scientist with little knowledge of biology • Need more bioinformaticians • Ontologies can facilitate automated annotation of genes / gene products • Difficult to Read and Infer from Ontologies • Ontologies can get very big (Phosphabase only small example) • Reasoners are sometimes slow and inaccurate www.quicklybored.com
Acknowledgements • Rhodococcus sp. RHA1 data • Eltis Lab: Dr. Lindsay Eltis, Dept. Microbiology & Biochemistry • Phosphabase Ontologoy • Wolstencroft Lab, University of Manchester, UK • Bioinformatics paper: Wolstencroft et al, 2006 • Phosphabase Ontology processing • Benjamin Good, iCAPTURE Centre, Vancouver