70 likes | 184 Views
Semantic Similarity Search. IDB Lab. Kisung Kim Cheolhan Kim. OASIS Environment. GOA Team Investigate relationship between proteins from the point of view of GO annotation. RDF storage, RDBMS. GO Annotation DB (UniProt). PubMed. Blast DB. GO annotation. Biomedical Literature.
E N D
Semantic Similarity Search IDB Lab. Kisung Kim Cheolhan Kim
OASIS Environment GOA Team Investigate relationship between proteins from the point of view of GO annotation RDF storage, RDBMS GO Annotation DB(UniProt) PubMed Blast DB GO annotation Biomedical Literature Sequence matching SubcellularLocalization DB PPI DB KEGG pathway Molecular function Cellular component Biological process
Introduction • Finding similar gene products is crucial for bioinformatics • Recently semantic similarity between gene products is focused on • Semantic similarity • Assessment of semantic relatedness between two objects • GO Annotation • Most bio-DBs provide the information of proteins annotated by GO • GO annotation provides the semantic information of gene products • Semantic similarity over GO • Measure similarity between gene products using the information encoded in the GO
GORank System What gene products do the function similar with PI4KB_HUMAN? Similarity between GPs is calculated based on similarity between annotation terms Gene Ontology Similarity of Ontology terms PI4KB_HUMAN Gene products DB Annotation Ranked top-k results Similar gene products Similarity of gene products
GORank System • Query input • Configuration • Method for calculating shared IC • Most informative common ancestor • Disjunctive common ancestor (GraSM) • Method for calculating term similarity • Lin • JiangConrath • Ontology • Molecular function • Biological process • Cellular component • Result size : k • Symbol of the query gene product • Names of terms with annotation weight
GORank : Ranked similarity search for proteins over Gene ontology Configuration Method for calculating shared IC Most informative common ancestor Method for calculating term similarity Lin Ontology Molecular function Result size Symbol of the query gene product Search Names of terms with annotation weight Term name Weight (0~1) Search
GORank : Ranked similarity search for proteins over Gene ontology Query Ontology : Molecular Function GeneProductSymbol : BACH_HUMAN Annotation Terms : acyl-CoA binding(TAS) serine esterase activity(IEA) palmitonyl-CoA hydrolase activity(IEA) hydrolase activity(IEA) Results TRI34_HUMAN (Similarity : 1.0) Splice Isoform 2 of Tripartite motif protein 34 Annotation Terms : acyl-CoA binding(TAS), serine esterase activity(IEA), palmitonyl-CoA hydrolase activity(IEA), hydrolase activity(IEA) Type : protein Source : MGI TRI35_HUMAN (Similarity : 0.9) Splice Isoform 2 of Tripartite motif protein 35 Annotation Terms : acyl-CoA binding(TAS), serine esterase activity(IEA), hydrolase activity(IEA) Type : protein Source : MGI TRI36_HUMAN (Similarity : 0.8) Splice Isoform 2 of Tripartite motif protein 36 Annotation Terms : acyl-CoA binding(TAS), palmitonyl-CoA hydrolase activity(IEA), hydrolase activity(IEA) Type : protein Source : MGI