310 likes | 323 Views
6 th InCoB 2007. GS2PATH: Linking Gene Ontology and Pathways. Jin Ok Yang Korean BioInformation Center. KOBIC ( Korean BioInformation Center). The national bioinformatics center of Korea Integration of diverse biological information Genome information Biodiversity information
E N D
6th InCoB 2007 GS2PATH: Linking Gene Ontology and Pathways Jin Ok Yang Korean BioInformation Center
KOBIC (Korean BioInformation Center) • The national bioinformatics center of Korea • Integration of diverse biological information • Genome information • Biodiversity information • Bioresource information • Bioinformatics training • International exchange program • Collaborative Development of bioinformatic tools • Bioportal (Biowiki) • Biopipeline (Bioworkflow engine)
BioWiki • Wiki • a web technology that enables anyone to create and update website contents • suited for developing online knowledge bases (e.g., Wikipedia ) • BioWiki • To adopt the wiki paradigm in biology • Collaborative development of biological knowledge bases • BioWiki Contest ( http://biowiki.net )
BioPipe (http://www.biopipe.net) • BioWorkFlow Engine • No installation required • Drag & Drop, and then Connect • BioPipe Contest !! • Aug 15th ~ Sep 20th • Open free Web 2.0 Toolbar Drag the module from the list and drop it into the design view. Ontology View Design View Monitoring View
6th InCoB 2007 GS2PATH: Linking Gene Ontology and Pathways Jin Ok Yang Korean BioInformation Center
Background GO & Pathways How do you interpret the gene set ? • Efforts on analyzing functional relationships among gene sets with GO term and pathways • Gene Ontology (GO) Term based analysis Analysis focused on function • GO term related pathways More useful information
Gene set enrichment • Enrichment Test • Means test to investigate which specific GO term the given gene set has • P-value for GO term was calculated by using hyper-geometric probability • Gene set enrichment • Derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation • Evaluates microarray data at the level of gene sets which are defined based on prior biological knowledge
Introduction: GO • GO databases and tools • GO term was used mostly to analyze data sets to identify significant biological changes • Pathways also can be exploited to find functional relationships in genes
GS2PATH • A system to find gene set enrichment in each Gene Ontology (GO) termsand map the part of gene set on GO term into biological pathways (KEGG and BioCarta) • An integrated search tool for analyzing the functional relationships in gene sets and for providing comprehensive results
Features • Functional relationships between GO term and pathways • Hyper-geometric test for gene set enrichment • Dual search for up- and down- regulation gene set • Various filtering options for GO terms • the number of descendant node, evidence of GO terms and statistical values mapping gene set in each GO term • User-specified coloring for genes onto pathways
Implementation (1/3) • GS2Path consists of • one internal database (mapping database) • four components • Query Processor, GO Accessor, KEGG Accessor, and BioCarta Accessor
Implementation (2/3) • Query Processor • receives a user query • Converts query into gene related information • distributes it to the other components, waiting for receiving results from them • GO Accessor • retrieves statistical values mapping gene set in each GO terms to KEGG and BioCarta Pathways • Calculates P-value using cumulative hyper-geometric distribution
Implementation (3/3) • BioCarta and KEGG Accessor • retrieve results from BioCarta and KEGG databases, respectively • To support user-specified coloring, • For KEGG, exploiting the web service API (SOAP/WSDL) of KEGG • For BioCarta, no supporting user-defined coloring API. Thus, after retrieving the image of a pathway from BioCarta database, we color genes in the image on-the-fly.
Search • Gene set enrichment test in organism total profile: GO, KEGG and BioCarta • Single or two parts analysis (up and down regulation) • Pathway viewer for KEGG and BioCarta
Input • Database • GO category • Biological Process • Molecular Function • Cellular Component • Pathways: KEGG and BioCarta • Organism • Human, Mouse, Rat, and Yeast • Gene ID list
Test • Enrichment test • P-value: Hyper-geometric probability • FDR (False Discovery Rate) • Adjustment of p-value
Filtering • GO Term • Evidence • Slim • Number of genes in term • P-value • Pathways: KEGG and Biocarta • Number of genes in term • P-value
Example: microarray clustering data Part A Part B
Interface Select GO category or Pathways Select Organism Put the gene set
Conclusion • Using Gs2path, users • Get the integrated Gene Ontology terms and pathways information together • Filter the results with various conditions • Capture relationships between Gene Ontology terms and Pathways • Available at http://array.kobic.re.kr:8080/arrayport/gs2path/