1 / 25

RDF based on Integration of Pathway Database and Gene Ontology

RDF based on Integration of Pathway Database and Gene Ontology. SNU OOPSLA LAB. 2005 DongHyuk Im. Contents. Introduction Pathway Database Enzyme Database Gene Ontology Related Works Our Approach Supporting Function Data Transformation Integration of KEGG, Enzyme, Gene Ontology

ina
Download Presentation

RDF based on Integration of Pathway Database and Gene Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB. 2005 DongHyuk Im

  2. Contents • Introduction • Pathway Database • Enzyme Database • Gene Ontology • Related Works • Our Approach • Supporting Function • Data Transformation • Integration of KEGG, Enzyme, Gene Ontology • Querying using SeRQL

  3. Pathway? • Most chemical reaction mechanisms are translated from a compound(substrate) to a compound(product) by enzyme acting • Importance • to comparison and analyze pathways in order to understand the process of creating compounds and the evolutive relevance between organisms • Drug Discovery

  4. Pathway Map : Glycolysis / Gluconeogenesis Map : Aquifex aeolicus

  5. Enzyme Database • EC number • Recommended name • Alternative names(if any) • Catalytic activity • Cofactors (if any) • Pointers to the SWISS-PORT entrie(s) that correspond to the enzyme (if any) • Pointers to disease(s) associated with a deficiency of the enzyme (if any)

  6. Enzyme Hierarchy [*] • Four levels • EC number • Ex) 1.1.1.1 is a member of the top level group [1] • The leftmost number identifies the highest level • [2.4.2.3] – [2.4.2.4](sibling) : similar reactions in pathway [1] [2] [3] [2.1] [2.2] [2.3] [2.2.1] [2.2.2] [2.2.3] [2.2.2.1] [2.2.2.2] [2.2.2.3]

  7. Gene Ontology

  8. KEGG

  9. KEGG • To computerize all aspects of cellular functions in terms of the pathway of interacting molecules or genes • To maintain gene catalogs for all organisms and link each gene product to a pathway component • To organize a database of all chemical compounds in the cell and link each compound to a pathway component • To develop computational technologies for pathway comparison, reconstruction, and analysis

  10. Why RDF Integration? • Pathway data model : DAG • RDF is a good model for representing pathway • RDF data model : DAG • Need integration of multiple knowledge sources available from internet : one of the major problems in biologists • RDF is a good model for same standard • Enzyme, GO : hierarchy structure • RDF is a good model for representing hierarchy structure • GO annotation is important • Enzymes(proteins) in certain pathway need GO annotation

  11. Related Works • KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res. • YeastHub: a semantic web case for integrating data in the life science domain, 2005, Bioinformatics • LIGAND: database of chemical compounds and reactions in biological pathways, 2002, Nucleic Acids Res. • Gene Ontology: tool for the unification biology, the Gene Ontology Consortium, 2000, Nature Genetics.

  12. Our System’s Supporting • KEGG • Search compound • Path prediction • Search Enzyme • Our system’s function to add • Integration Query (pathway+enzyme+GO) • Relaxation Query using GO hierarchy • Searching pathway using enzyme information

  13. Search Compounds target Compound : C00668

  14. Pathway Prediction Tool compound Relaxation query using enzyme hierarchy

  15. Search Enzyme Enzyme : 5.3.1.9

  16. From Pathway to Gene Ontology Select enzyme

  17. Data Translation for Integration GENOS Storage XSLT KGML Data KEGG RDF Data Adding GO ID Enzyme RDF Data GO RDF Data XSLT : http://www.w3.org/2005/02/13-KEGG/

  18. KEGG RDF Data(1/2) Gene entry <k:entry> <Gene rdf:nodeID="_1"> <k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/> <k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/> <k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/> <k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> </k:graphics> </Gene> </k:entry> Enzyme entry <k:entry> <Enzyme rdf:nodeID="_3"> <k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/ec#1.2.1.5"/> <k:graphics> <Rectangle k:name="1.2.1.5" k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="170" k:y="1039" k:width="45" k:height="17"/> </k:graphics> </Enzyme> </k:entry> No information Compound entry <k:entry> <Compound rdf:nodeID="_4"> <k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/> <k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?compound+C00033"/> <k:graphics> <Circle k:name="C00033" k:fgcolor="#000000" k:bgcolor="#FFFFFF" k:x="102" k:y="971" k:width="8" k:height="8"/> </k:graphics> </Compound> </k:entry>

  19. KEGG RDF Data(2/2) Relation <k:relation> <ECrel> <k:entry1 rdf:resource="_42"/> <k:entry2 rdf:resource="_48"/> <compound rdf:resource="_88"/> </ECrel> </k:relation> Reaction <k:reaction reversible="" rdf:about="http://www.w3.org/2005/02/13-KEGG/rn#R00710"> <k:substrate rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00084"/> <k:product rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/> </k:reaction>

  20. How to Process KEGG Pathway • Problem • GENOS(Sesame) does not support multiple graph • KEGG data consists of multiple documents • Ex) map00010.rdf, aae00010.rdf … • Solution • Using namespace, we can distinguish maps • When Storing pathway data, pathway’s map name is added as a namespace in resource table of GENOS

  21. Processing Pathway Data <k:Pathway k:org="aae" k:number="00010" k:title="Glycolysis / Gluconeogenesis"> …. …. <k:entry> <Gene rdf:nodeID="_1"> <k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/> <k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/> <k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/> <k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000" k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/> </k:graphics> </Gene> </k:entry> conflict triples table of GENOS resources table of GENOS

  22. Integrating Databases Enzyme number GO ID

  23. Relaxation Querying using SeRQL E1 subclassof subclassof E1.* C2 C1 E1.* SeRQL SELECT C1,C2 FROM Path_EXP WHERE E1 LIKE “1.*" Dewey order Ex. 1.1 and 1.2 are childrens of 1 use Prefix

  24. Considering Performance KEGG : Pathway List aae:aq_018 path:aae03010 aae:aq_020 path:aae03010 aae:aq_021 path:aae00400 …. …. …. …. eco:b1236 path:eco00052 eco:b1236 path:eco00500 eco:b1236 path:eco00520 …. using genes_index Genes Map

  25. Schedule • Implementation (~11/30) • Integrated Databases • Query Processor for pathway • Simple UI (Web :JSP) • Complete Paper (~12/10)

More Related