1 / 28

An Ontology for Protein-Protein Interaction Data

An Ontology for Protein-Protein Interaction Data. Karen Jantz CIS Honors Project December 7, 2006. Overview. Problem Statement Objectives Approach Background Methodology Evaluation Demonstration Conclusion. Problem Statement. Several sources for protein-protein interaction data

thom
Download Presentation

An Ontology for Protein-Protein Interaction Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Ontology for Protein-Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

  2. Overview • Problem Statement • Objectives • Approach • Background • Methodology • Evaluation • Demonstration • Conclusion

  3. Problem Statement • Several sources for protein-protein interaction data • Different schemata • Different purposes • Different strengths/weaknesses

  4. Objectives • Unify the data • Enable data mining • Evaluate reliability of data across data sources • Gain new information about the entire data set • Enable others to easily add other data sources to the set

  5. Approach: ontology • ontology – n. • that which exists(philosophy) • that which is represented (artificial intelligence) • A descriptive data model • Defines the entities and relationships within a domain • Based upon data • Human-readable

  6. Approach: ontology Data integration • Enables simultaneous querying across multiple databases • Data transformation • Enables interchange between database formats • Data mining • Enables reasoning and learning over the entire data set

  7. Background: Data Sources • DIP (Jing Xia) • Database of Interacting Proteins • Most reliable data set • Jing Xia • BIND (Abhijit Erande, Aaron Schoenhofer) • Biomolecular Interactions Network Databank • Very large data set • Contains interactions, molecular complexes, and pathways

  8. Background: Data Sources • MINT • Molecular INTeractionsdatabase • experimentally verified protein interactions • Evaluates confidence level • IntAct • Not limited to binary interactions • Allows user submissions • mips CYGD • Munich Information Centerfor Protein Sequences: Comprehensive Yeast Genome Database • Limited to yeast • Focuses on sequencing

  9. Background: Tools • Protégé • Open-Source Project • Graphical ontology editor • Interacts with OWL Reasoner • Detailed API for modifying ontologies programmatically

  10. Background: Tools • Prompt • A Protégé Plugin • Enables ontology mapping • Enables ontology comparison

  11. Background: Related Work • PSI-MI • Controlled vocabulary for PPI data • Not a proposed database structure • Decreases the strength of information • Helpful in defining relationships and keys

  12. Methodology: Overview Web Interface Q: What interactions have been observed between with protein A? Q: What experiments give evidence for a given interaction? Unified Ontology Unified Data Set transformation DIP BIND MIPS MINT IntAct

  13. Methodology: Design • Review the singular database schemata and determine strengths/weaknesses • View data files • Native formats • PSI-MI formats • Create a unified schema of the data sources • Create the unified ontology in Protégé • Create each singular database as a subset of the unified ontology

  14. Protégé Screenshot

  15. Methodology: Data Import • DOMParser • Load data from XML • Protégé-OWL API • Insert entities into singular databases

  16. Methodology: Transformation • Use Prompt to create a mapping for each specific data source to the unified ontology • Use Prompt mappings to insert individuals from each singular ontology into the unified model

  17. Methodology: Transformation • Duplicate Data • Need to fill in attributes on existing records • Write ‘Algorithm Plugin’ for Prompt to determine when individuals are the same

  18. Prompt Screenshot - Mapping

  19. Methodology: Query Interface • Export Protégé data into MySQL • Web interface for collecting data • Working with domain experts to determine useful views, queries

  20. Evaluation • Performance • Transformation Time in Protégé • Query Time for Web Interface • Size • Minimize redundancy in data model • Minimize duplicate data

  21. Evaluation • Correctness • Domain Experts • Dr. Brown, Dr. Wang • Maintain proper data relationships • Utility • Enrich data

  22. Evaluation

  23. Demonstration

  24. Future Work • Complete transformations • Import data • Evaluate ontology • Add other databases to model

  25. Conclusions • Adequate start • Needs improvement, evolution, more data sources • As the project matures, the ontology will be ready for use in the biological domain • Will be able to more easily gain information about protein-protein interactions

  26. References • AAAI.org - AITopics: “Ontology” • http://www.aaai.org/AITopics/html/ontol.html • Protégé • http://protege.stanford.edu/overview/protege-owl.html • Prompt • http://protege.cim3.net/cgi-bin/wiki.pl?Prompt • PSI-MI • http://psidev.sourceforge.net/mi/xml/doc/user

  27. References • BIND • http://www.bind.ca • DIP • http://www.dip.doe-mbi.ucla.edu • IntAct • http://www.ebi.ac.uk/intact/site/ • MINT • http://mint.bio.uniroma2.it/mint/Welcome.do • MIPS • http://mips.gsf.de/genre/proj/yeast

  28. Q & A

More Related