270 likes | 431 Views
Babelomics Functional interpretation of genome-scale experiments Valencia, March 2008. David Montaner dmontaner@cipf.es http://bioinfo.cipf.es Bioinformatics Department Centro de Investigacion Principe Felipe (Valencia).
E N D
BabelomicsFunctional interpretation of genome-scale experimentsValencia, March 2008 David Montaner dmontaner@cipf.es http://bioinfo.cipf.es Bioinformatics Department Centro de Investigacion Principe Felipe (Valencia)
Babelomics: A systems biology web resource for the functional interpretation of genome-scale experiments. http://www.babelomics.org
Genome-scale experiment output Functional Interpretation
Functional interpretation To “interpret” experimental results is to use current knowledge to rearrange them in a meaningful way. Experimental results observed in the lab (not always a wet-lab). • Recorded to: • Test a hypothesis. • Get a first insight of a biological process. DB information. Already tested and stored
Babelomics imported databases ENSEMBL www.ensembl.org GO KEGG Interpro Transcription Factors Cisred Bioentities Literature Gene expression Homo sapiens HGNC symbol EMBL acc UniProt/Swiss-Prot UniProtKB/TrEMBL Ensembl IDs RefSeq EntrezGene Affymetrix Agilent PDB Protein Id IPI…. Mus musculus Rattus norvegicus Ensembl ID Gallus gallus Drosophila melanogaster Caenorhabditis elegans Saccharmoyces cerevisae Arabidopsis thaliana
Babelomics tools FatiGO: Finds differential distributions of Gene Ontology terms between two groups of genes. FatiGOplus: an extension of FatiGO for InterPro motifs, pathways and SwissProt KW , transcription factors (TF), gene expression in tissues, bioentities from scientific literature, cis-regulatory elements CisRed. Tissues Mining Tool: compares reference values of gene expression in tissues to your results. MARMITE Finds differential distributions of bioentities extracted from PubMed between two groups of genes. FatiScan: detect significant functions with Gene Ontology, InterPromotifs, Swissprot KW and KEGG pathways in lists of genes ordered according to differents characteristics. MarmiteScan: Use chemical and disease-related information to detect related blocks of genes in a gene list with associated values. GSEA: Detects blocks of functionally related genes with significant coordinate over- or under-expression using the Gene Set Enrichment Analysis.
Babelomics tools Information about genes which can be coded in binary variables, tags, labels. Information about genes which has to be coded into continuous numerical values.
FatiGO • Compare two lists of genes. • Compare one list of genes against the rest of the genome. • One statistical test (Fisher’s exact) for each Block of annotation. • Multiple testing context. • Filtering of annotation is convenient. We test les terms; more interesting ones.
A B Biosynthesis 6 2 No biosynthesis 4 8 Testing the distribution of functional terms among two groups of genes(remember, we have to test hundreds of Blocks) One Gene List (A) The other list (B) Are this two groups of genes carrying out different biological roles? Biosynthesis 60% Biosynthesis 20% Sporulation 20% Sporulation 20% Genes in group A have significantly to do with biosynthesis, but not with sporulation.
A B Biosynthesis 6 2 No biosynthesis 4 8000 Testing the distribution of functional terms among two groups of genes(remember, we have to test hundreds of Blocks) One Gene List (A) The other list (B) Are this two groups of genes carrying out different biological roles? Biosynthesis 60% Biosynthesis 20% Sporulation 20% Sporulation 20% Genes in group A have significantly to do with biosynthesis, but not with sporulation.
FatiGO - Babelomics • Deal with duplicated genes • Exclude them • Include them • Try to reduce the background space of genes of interest • Using just genes with some annotation. • Use just genes annotated at certain level of the GO ontology.
Functional interpretation To “interpret” experimental results is to use current knowledge to rearrange them in a meaningful way. Experimental results observed in the lab (not always a wet-lab). • Recorded to: • Test a hypothesis. • Get a first insight of a biological process. DB information. Already tested and stored
Organism FatiGO – selecting the database
Database FatiGO – selecting the database
Your own Database FatiGO – your own database
Gene List2 Rest of genome Gene List1 FatiGO – introducing your data
FatiGO Results Gene group1 is enriched in this functional block Gene group2 is enriched in this functional block percentages p-values corrected p-values
Very few genes selected to arrive to a significant conclusion on GO1 and GO2 Functional Classes expressed as blocks in A and B FatiGO approach may not be very powerfull A B GO1 GO2 - Significantly over-expressed in B If a threshold basedon the experimental values is applied, and the resulting selection of genes compared for enrichment of a functional term, this might not be found t-test with two tails. p<0.05 statistic Significantly over-expressed in A +
FatiScan • Interpret a ranked list of genes. • There is not need for choosing a cut-off. All information is included. • One statistical test for each Block of annotation. • Multiple testing context. • Filtering of annotation is convenient. We test les terms; more interesting ones.
Organism DataBases Gene List ordered according the experimental value FatiScan
Testing along an ordered list Annotation label A Annotation label B Annotation label C B C A List of genes + • Index ranking genes according to some biological aspect under study. • Database that stores gene class membership information. • FatiScan searches over the whole ordered list, trying to find runs of functionally related genes. Block of genes enriched in the annotation A Annotation C is homogeneously distributed along the list Block of genes enriched in the annotation B -
FatiScan results B C A List of genes + -
Functional Blocks over-represented among genes over-expressed in A Functional Blocks over-represented among genes over-expressed in B FatiScan results A B + Gene ranking index -
FatiScan List of genes ranked by biological criteria + Fisher´s test Significant Functional terms - Al-Shahrour et al., 2005 Bioinformatics; 2007 BMC Bioinformatics
FatiScan Example – two classes Tumor Control t ~Tumor mean expression – Control mean expression + t Proliferation Is more associated with the genes on the top of the list All genes in the array Is more associated with the genes that show higher expression in Tumors - t
FatiScan Example - Survival Analysis • Cromer et all. Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis. Oncogene 2004, 23(14) : 2484-2498. • 34 hypopharyngeal cancer samples taken from patients undergoing surgery. • Analyzed using Affymetrix HG-U95A microarrays (~12650 distinct transcription features ). • Disease free survival time after intervention was recorded Cox proportional hazards model h(t) = h0 (t) * exp (b * gene expression)
Gene Ontology: biological process Hazard increased with expression + b lowest p-value = 0.96 Hazard decreased with expression - b