560 likes | 718 Views
Chapter 8: Biological Knowledge Assembly and Interpretation . Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao . Outline. Review of major computational approaches to facilitate biological interpretation of
E N D
Chapter 8: Biological Knowledge Assembly and Interpretation JuHan Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao
Outline • Review of major computational approaches to facilitate biological interpretation of • high-throughput microarray • and RNA-Seq experiments.
Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice
Glossary • FAA: Functional Annotation Analysis • GO: Gene Ontology • Pathway • DEG: Differentially Expressed Genes • GSEA: Gene Set Enrichment Analysis • Biological Interpretation and Biological Semantics • Concept lattice analysis
Pathway and Ontology-Based Analysis • GO and biological pathway-based analysis: • one of the most powerful methods for inferring the biological meanings of expression changes • list of genes obtained by: • differential expression analysis • co-expression analysis (or clustering)
Pathway and Ontology-Based Analysis • Attributes can be applied for FAA: • transcription factor binding • clinical phenotypes like disease associations • MeSH(Medical Subject Heading) terms • microRNA binding sites • protein family memberships • chromosomal bands, etc • GO terms • biological pathways
Pathway and Ontology-Based Analysis • Features may have their own ontological structures • GO has a structure as a DAG (Directed Acyclic Graph)
Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice
Pathway and Ontology-Based Analysis • DEGs: • 3 techniques which help obtain DEGs: • t-test • Wilcoxon’s rank sum test • ANOVA • Need to note that multiple-hypothesis-testing problem should be properly managed
Pathway and Ontology-Based Analysis • Co-expression analysis
Pathway and Ontology-Based Analysis • Co-expression analysis • puts similar expression profiles together and different ones apart • Returning genes that are assumed to be co-regulated • Clustering algorithms: • hierarchical-tree clustering • partitionalclustering
Pathway and Ontology-Based Analysis • Pathwaysare powerful resources for the understanding of shared biological processes • E.g.: KEGG, MetaCycand BioCarta (signaling pathways)
Pathway and Ontology-Based Analysis • MetaCyc: • an experimentally determined non-redundant metabolic pathway database • It is the largest collection • containing over 1400 metabolic pathways
Pathway and Ontology-Based Analysis • Ontology / GO: • providing a shared understanding of a certain domain of information • controlled vocabularies • DAG structures with 3 vocabularies of GO: • Molecular Function (MF) • Cellular Compartment (CC) • Biological Process (BP)
Pathway and Ontology-Based Analysis • Common Gos: • MIPS: integrated source, protein properties, variety of complete genomes • MeSH: clinical including disease names • OMIM (Online Mendelian Inheritance in Man) • UMLS (Unified Medical Language System)
Pathway and Ontology-Based Analysis • GO enrichment test: • For example • if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’ • only 1% of the genes in the whole human genome fall into this functional category
Pathway and Ontology-Based Analysis • Common statistical tests: • Chi-square • binomial • hypergeometric tests
Pathway and Ontology-Based Analysis • hypergeometrictest:
Pathway and Ontology-Based Analysis • Avoid pitfalls when using hypergeometrictest • Choice of background, that makes substantial impact on the result. • All genes having at least one GO annotation • all genes ever known in genome databases • all genes on the microarray • GO has a hierarchical tree (or graphical) structure while hypergeometrictest assumes independence of categories
Pathway and Ontology-Based Analysis • Common Tools • DAVID • ArrayX- Path • Pathway Miner • EASE • GOFish • GOTree etc.
Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice
Gene Set-Wise Differential Expression Analysis • Evaluates coordinated differential expression of gene groups • Gene Set Enrichment Analysis (GSEA) • The first developed in this category • evaluates for each a pre-defined gene set the significant association with phenotypic classes
Gene Set-Wise Differential Expression Analysis • Difference between FAA and GSEA: • FAA: find over-represented GO terms from a interesting gene list • GSEA: obtain the pre-defined gene list first and test the changes under different conditions.
Gene Set-Wise Differential Expression Analysis • Advantages of gene set-wise differential expression analysis: • successfully identified modest but coordinated changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis. • (many tiny expression changes can collectively create a big change) • straightforward biological interpretation because the gene sets are defined by biological knowledge
Gene Set-Wise Differential Expression Analysis • Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,
Gene Set-Wise Differential Expression Analysis • Typical gene sets: • regulatory-motif • function-related • disease-related sets • Database: • MSigDB: • 6769 gene sets • classified into five different collections • Has some interesting extensions
Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice
Differential Co-Expression Analysis • Co-expression analysis: • determines the degree of co-expression of a cluster of genes under a certain condition • Differential co-expression analysis: • determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions
Differential Co-Expression Analysis • 3 major types: • (a) differential co-expression of gene cluster(s) • (b) gene pair-wise differential co- expression • (c) differential co-expression of paired gene sets
Differential Co-Expression Analysis • Type (a), identify differentially co-expressed gene cluster(s) between two conditions • Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co-expression of genes:
Differential Co-Expression Analysis Type (a) cont.
Differential Co-Expression Analysis • Type (b)
Differential Co-Expression Analysis • Type (b), identify differentially co-expressed gene pairs • Techniques: • F-statistic • A meta-analytic approach
Differential Co-Expression Analysis • Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs. • Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.
Differential Co-Expression Analysis • Type (c), dCoxS(differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions • Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.
Differential Co-Expression Analysis • Type (c) cont. • To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.
Differential Co-Expression Analysis • Type (c) cont.
Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice
Biological Interpretation and Biological Semantics • Biomedical semantics provides rich descriptions for biomedical domain knowledge. • Motivation for Biological Semantics: • GO has limitations: • The result of GO is typically a long unordered list of annotations • Most of the analysis tools evaluate only one cluster at a time • time-consuming to read the massive annotation lists • hard to manually assemble • Many annotations are redundant
Biological Interpretation and Biological Semantics • Introducing BioLattice: • a mathematical framework • based on concept lattice analysis • organize traditional clusters and associated annotations into a lattice of concepts • A graphical summary • considers gene expression clusters as objects and annotations as attributes • Thus, complex relations among clusters and annotations are clarified, ordered and visualized.
Biological Interpretation and Biological Semantics • Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added