140 likes | 424 Views
Manolis Kellis: Research synopsis. Why biology in a computer science group? Big biological questions: Interpreting the human genome. Revealing the logic of gene regulation. Principles of evolutionary change. Underlying computational techniques:
E N D
Manolis Kellis: Research synopsis • Why biology in a computer science group? • Big biological questions: • Interpreting the human genome. • Revealing the logic of gene regulation. • Principles of evolutionary change. • Underlying computational techniques: • Comparative genomics: evolutionary signatures • Regulatory genomics: motifs, networks, models • Epigenomics: chromatin states, dynamics, disease • Phylogenomics: evolution at the genome scale • Defining characteristics of research program: • Genome-wide rules, exploit nature of problems, interdisciplinary collaborations, biology impact Brief overview 1 slide each vignette
(1) Comparative genomics: evolutionary signatures • Protein-coding signatures • 1000s new coding exons • Translational readthrough • Overlapping constraints • Non-coding RNA signatures • Novel structural families • Targeting, editing, stability • Structures in coding exons • microRNA signatures: • Novel/expanded miR families • miR/miR* arm cooperation • Sense/anti-sense switches • Regulatory motif signatures • Systematic motif discovery • Regulatory motif instances • TF/miRNA target networks • Single binding-site resolution
(2) Regulatory genomics: circuits, predictive models • ENCODE/modENCODE • 4-year effort, dozens of experimental labs • Integrative analysis • Systematic genome annotation • Flagship NIH project • Initial annotation of the non-coding genome, from 20% to 70% • Systems biology for an animal genome for the first time possible • Students and postdocs are co-first authors, leadership roles • Predictive models of gene regulation • Infer networks • Predict function • Predict regulators • Predict gene expression
(3) Phylogenomics: Bayesian gene-tree reconstruction Generative model Two components of gene evolution 2. Species-specific rates 1. Family rate Si Fj ~normal(μi,σi) ~gamma (α,β) New phylogenomic pipeline Selective pressures on gene function Sequence likelihood Branch length prior Topology prior Length I, Topology T, Reconciliation R Bayesian formulation Population dynamics of the species HKY model (traditional) Learned Fj,Si distributions Birth-Death process Alignment data D, species-level parameters θ
Vignette: Epigenomics Jason Ernst, PouyaKheradpour Ernst and Kellis, Nature Biotech, 2010 Ernst, Kheradpour et al, Nature, 2011 (in press)
Epigenomics and ‘chromatin state’ signatures Promoter states DNA • Learn de novo combinations of chromatin marks • Reveal functional elements • Use for genome annotation • Use for studying dynamics across many cell types Transcribed states Histone tails Active Intergenic Repressed Chromatin ‘marks’
ChromHMM: learning ‘hidden’ chromatin states Transcription Start Site Enhancer DNA Transcribed Region Observed chromatin marks. Called based on a poisson distribution K4me3 K4me3 K4me1 K4me1 K36me3 K36me3 K36me3 K36me3 K27ac K4me1 Most likely Hidden State 5 2 1 3 5 5 6 6 6 6 4 6 High Probability Chromatin Marks in State 0.8 0.8 1: 0.7 200bp intervals 4: All probabilities are learned de novo from chromatin data alone (Baum-Welch aka. EM) K27ac K4me1 K4me1 0.9 0.8 2: 5: K4me1 K4me3 Each state: vector of emissions, vector of transitions 3: 6: 0.9 0.9 K4me3 K36me3
Chromatin states dynamics across nine cell types • State definitions are cell-type invariant • Same combinations consistently found • State locations are cell-type specific • Can study pair-wise or multi-way changes
Multi-cell activity profiles and their correlations Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases TF On TF Off Motif aligned Flat profile Motif enrichment Motif depletion ON OFF Active enhancer Repressed Chromatin state & gene expression link enhancers and target genes TF motif enrichment & TF expression reveal activators / repressors
Coordinated activity reveals enhancer links Predicted regulators Enhanceractivity Geneactivity • Enhancer networks: Regulator enhancer target gene • Ex1: Oct4 predicted activator of embryonic stem (ES) cells • Ex2: Ets activator of GM/HUVEC (but not either one alone) Activity signatures for each TF
xx Revisiting disease- associated variants • Disease-associated SNPs enriched for enhancers in relevant cell types • E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator
Contributions Science Nature Nature Nature Nature Nature Nature Nature Nature In review We aim to further our understanding of the human genome by computational integration of large-scale functional and comparative genomics datasets. • We use comparative genomics of multiple related species to recognize evolutionary signatures of protein-coding genes, RNA structures, microRNAs, regulatory motifs, and individual regulatory elements. • We use combinations of epigenetic modifications to define chromatin states associated with distinct functions, including promoter, enhancer, transcribed, and repressed regions, each with distinct functional properties. • We develop phylogenomic methods to study differences between species and to uncover evolutionary mechanisms for the emergence of new gene functions Our methods have led to numerous new insights on diverse regulatory mechanisms, uncovered evolutionary principles, and provide mechanistic insights for previously uncharacterized disease-associated SNPs Nature Biotech Nature Nature PLoS Genetics Nature Gen Genes&Dev Nature Nature Biotech MBE Genome Research Nature Nature Nature Nature Nature WBpress Genome Research Nature Genome Research PLoS Comp. Bio. PNAS Nature G.R. BioChem Genes & Development Genome Research Nature GenomRes Nature G.R. Science Nature PNAS BMC Evo. Bio. ACM TKDD RECOMB RECOMB Genome Research RECOMB J. Comp. Bio. PNAS