1 / 56

Chapter 8: Biological Knowledge Assembly and Interpretation

Chapter 8: Biological Knowledge Assembly and Interpretation . Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao . Outline. Review of major computational approaches to facilitate biological interpretation of

mandell
Download Presentation

Chapter 8: Biological Knowledge Assembly and Interpretation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8: Biological Knowledge Assembly and Interpretation JuHan Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao

  2. Outline • Review of major computational approaches to facilitate biological interpretation of • high-throughput microarray • and RNA-Seq experiments.

  3. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  4. Glossary • FAA: Functional Annotation Analysis • GO: Gene Ontology • Pathway • DEG: Differentially Expressed Genes • GSEA: Gene Set Enrichment Analysis • Biological Interpretation and Biological Semantics • Concept lattice analysis

  5. Pathway and Ontology-Based Analysis • GO and biological pathway-based analysis: • one of the most powerful methods for inferring the biological meanings of expression changes • list of genes obtained by: • differential expression analysis • co-expression analysis (or clustering)

  6. Pathway and Ontology-Based Analysis

  7. Pathway and Ontology-Based Analysis • Attributes can be applied for FAA: • transcription factor binding • clinical phenotypes like disease associations • MeSH(Medical Subject Heading) terms • microRNA binding sites • protein family memberships • chromosomal bands, etc • GO terms • biological pathways

  8. Pathway and Ontology-Based Analysis • Features may have their own ontological structures • GO has a structure as a DAG (Directed Acyclic Graph)

  9. Pathway and Ontology-Based Analysis • DEGs:

  10. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  11. Pathway and Ontology-Based Analysis • DEGs: • 3 techniques which help obtain DEGs: • t-test • Wilcoxon’s rank sum test • ANOVA • Need to note that multiple-hypothesis-testing problem should be properly managed

  12. Pathway and Ontology-Based Analysis • Co-expression analysis

  13. Pathway and Ontology-Based Analysis • Co-expression analysis • puts similar expression profiles together and different ones apart • Returning genes that are assumed to be co-regulated • Clustering algorithms: • hierarchical-tree clustering • partitionalclustering

  14. Pathway and Ontology-Based Analysis • Pathwaysare powerful resources for the understanding of shared biological processes • E.g.: KEGG, MetaCycand BioCarta (signaling pathways)

  15. Pathway and Ontology-Based Analysis • MetaCyc: • an experimentally determined non-redundant metabolic pathway database • It is the largest collection • containing over 1400 metabolic pathways

  16. Pathway and Ontology-Based Analysis • Ontology / GO: • providing a shared understanding of a certain domain of information • controlled vocabularies • DAG structures with 3 vocabularies of GO: • Molecular Function (MF) • Cellular Compartment (CC) • Biological Process (BP)

  17. Pathway and Ontology-Based Analysis • Common Gos: • MIPS: integrated source, protein properties, variety of complete genomes • MeSH: clinical including disease names • OMIM (Online Mendelian Inheritance in Man) • UMLS (Unified Medical Language System)

  18. Pathway and Ontology-Based Analysis • GO enrichment test: • For example • if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’ • only 1% of the genes in the whole human genome fall into this functional category

  19. Pathway and Ontology-Based Analysis • Common statistical tests: • Chi-square • binomial • hypergeometric tests

  20. Pathway and Ontology-Based Analysis • hypergeometrictest:

  21. Pathway and Ontology-Based Analysis • Avoid pitfalls when using hypergeometrictest • Choice of background, that makes substantial impact on the result. • All genes having at least one GO annotation • all genes ever known in genome databases • all genes on the microarray • GO has a hierarchical tree (or graphical) structure while hypergeometrictest assumes independence of categories

  22. Pathway and Ontology-Based Analysis • Common Tools • DAVID • ArrayX- Path • Pathway Miner • EASE • GOFish • GOTree etc.

  23. Gene Set-Wise Differential Expression Analysis

  24. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  25. Gene Set-Wise Differential Expression Analysis • Evaluates coordinated differential expression of gene groups • Gene Set Enrichment Analysis (GSEA) • The first developed in this category • evaluates for each a pre-defined gene set the significant association with phenotypic classes

  26. Gene Set-Wise Differential Expression Analysis • Difference between FAA and GSEA: • FAA: find over-represented GO terms from a interesting gene list • GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

  27. Gene Set-Wise Differential Expression Analysis • Advantages of gene set-wise differential expression analysis: • successfully identified modest but coordinated changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis. • (many tiny expression changes can collectively create a big change) • straightforward biological interpretation because the gene sets are defined by biological knowledge

  28. Gene Set-Wise Differential Expression Analysis • Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

  29. Gene Set-Wise Differential Expression Analysis • Typical gene sets: • regulatory-motif • function-related • disease-related sets • Database: • MSigDB: • 6769 gene sets • classified into five different collections • Has some interesting extensions

  30. Differential Co-Expression Analysis

  31. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  32. Differential Co-Expression Analysis • Co-expression analysis: • determines the degree of co-expression of a cluster of genes under a certain condition • Differential co-expression analysis: • determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions

  33. Differential Co-Expression Analysis • 3 major types: • (a) differential co-expression of gene cluster(s) • (b) gene pair-wise differential co- expression • (c) differential co-expression of paired gene sets

  34. Differential Co-Expression Analysis • Type (a), identify differentially co-expressed gene cluster(s) between two conditions • Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co-expression of genes:

  35. Differential Co-Expression Analysis Type (a) cont.

  36. Differential Co-Expression Analysis • Type (b)

  37. Differential Co-Expression Analysis • Type (b), identify differentially co-expressed gene pairs • Techniques: • F-statistic • A meta-analytic approach

  38. Differential Co-Expression Analysis • Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs. • Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

  39. Differential Co-Expression Analysis • Type (c), dCoxS(differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions • Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

  40. Differential Co-Expression Analysis • Type (c) cont. • To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

  41. Differential Co-Expression Analysis • Type (c) cont.

  42. Biological Interpretation and Biological Semantics

  43. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  44. Biological Interpretation and Biological Semantics • Biomedical semantics provides rich descriptions for biomedical domain knowledge. • Motivation for Biological Semantics: • GO has limitations: • The result of GO is typically a long unordered list of annotations • Most of the analysis tools evaluate only one cluster at a time • time-consuming to read the massive annotation lists • hard to manually assemble • Many annotations are redundant

  45. Biological Interpretation and Biological Semantics • Introducing BioLattice: • a mathematical framework • based on concept lattice analysis • organize traditional clusters and associated annotations into a lattice of concepts • A graphical summary • considers gene expression clusters as objects and annotations as attributes • Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

  46. Biological Interpretation and Biological Semantics • Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added

More Related