150 likes | 338 Views
A H igh- R esolution M ap of H uman E volutionary C onstraint U sing 29 M ammals. Kerstin Lindblad-Toh, et al. 2011 Jimmy Ao Gordon Man. Intro. ~1.5% of the human genome encodes sequences
E N D
A High-Resolution Map of Human Evolutionary Constraint Using 29 Mammals Kerstin Lindblad-Toh, et al. 2011 Jimmy Ao Gordon Man
Intro • ~1.5% of the human genome encodes sequences • Previous comparative analysis with mouse, rat, and dog genomes show that at least 5% is under purifying selection • Removing alleles that are deleterious • Previous studies could estimate percentage of genome under constraint • Had difficulty finding the constrained elements
Intro • Sequence mammalian genomes to identify functional elements in human genome • Used 29 Eutherian (placental) genomes • Achieve maximum divergence across four major mammalian clades • Identified 4.2% of the human genome as constrained • 60% of these bases have potential function
Figure 1. A phylogenetic tree of all 29 mammals used in this analysis based on the substitution rates in the MultiZ alignments. Species blue - finished genomes green - high quality drafts black - 2x assemblies branches red≥10 substitutions/100bp blue <10 substitutions/100bp
Detection of Constrained Sequences • Compared genome-wide conservation of ancestral repeats using multiple statistical test • SiPhy-ω statistic measures substitution rates • SiPhy-π like ω but in addition measures biased substitution patterns • PhastCons statistics • All statistical test showed similar results • estimated that 5.36% of genomes are under evolutionary constraint • Emsembl and PAML • detected an additional 1.3% of the human genome in constraint elements from SiPhy-π
Figure S7. Biased nucleotide substitution patterns identifies positions where two bases appear equally constraint and correlating with SNPs in the human population.
Constraint within the human population • Mammalian constraint elements correlated to human constraint elements • Mammalian constrained elements had lower Small Nucleotide Polymorphisms (SNPs) • more contained elements had even lower SNPs • Biases substitution patterns in mammal SNPs were similar to ones observed in humans • same allele preference in both mammalian and human evolution
RNA Structures and Families of Structural Elements • Used evolutionary signatures characteristic of conserved RNA secondary structures to reveal 37,381 candidate structural elements • Found using EvoFold and RNAz 2.0
Conservation Patterns in Promoters • Categorized promoters into three categories • High constraint promoters • ~66% high constraint promoters associated with CpG islands • Involved with development • Low constraint • ~41% associated with CpG islands • Immunity, reproduction and perception • Intermittent constraint • Basic cellular functions • ~66% intermittent constraint promoters associated with CpG islands
Identifying Motifs • More genomes did not help discover new motifs • Rat, mouse, and dog genomes found majority of motifs • Better in detecting individual motifs and predicting their target sites • Created network linking 375 motifs to predicted targets • Median of 21 predicted regulators per target gene • Compared results to chromatin immunoprecipitation (ChIP) experiments • Determines whether specific proteins are associated with specific genomic regions • Strong agreement between motif-based targets and ChIP
Codon Specific Selection • Ratio dN/dS of non-synonymous to synonymous codon substitutions as evidence of positive selection (>1) or negative selection (<1) • Positive selection genes involved in immune response and taste perception • Unexpected functions such as meiotic chromosome segregation and DNA-dependent regulation of transcription • Localized positive selection was enriched in core biochemical processes, including microtubule-based movement, DNA topological change and telomere maintenance
Mobile Elements • DNA that can move around the genome • When retained and trait function changes = exaptation (ex. feathers) • Found over 280,000 mobile element exaptations common to mammalian genomes • Compare to 10,000 previously recognized cases • Often only a small fraction (median ~11%) of each mobile element is constrained • Recent exaptations are generally found near ancestral regulatory elements
Applications • Genome 10k effort • Genomic database of 10,000 vertebrates • New model organisms • Molecular evolutionary genetics • Encyclopedia of DNA elements • international effort to build a database of functional elements in the human genome • NIH Roadmap Epigenomics Project • human epigenomic data • Human biology, health, and disease • Overlap with known disease elements
Summary • Using a wide array of bioinformatics they compared 29 mammalian genomes. • 5.5% of the human genome has undergone purifying selection • Decrease frequency in alleles • Found constrained elements make up 4.2% of the genome • Suggest possible functions to 60% of constrained bases • New coding exons, stop codons, >10,000 regions of overlapping synonymous constraint within protein-coding exons • 220 candidate RNA structural families • million elements overlapping potential promoter, enhancer and insulator regions • >280,000 mobile elements • 1,000 human and primate accelerated elements • Highly conserved species specific regions