1 / 52

The tangled genome

The tangled genome. Gil McVean. The real heroes. PanMap – Genome sequencing of 10 Western Chimpanzees. Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways

Download Presentation

The tangled genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The tangled genome Gil McVean

  2. The real heroes

  3. PanMap – Genome sequencing of 10 Western Chimpanzees • Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways • Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales • There are a surprising number (6+ now ‘confirmed)’) of trans-specific polymorphisms, probably maintained through host-pathogen interactions

  4. A tangle of sequence

  5. Difficulties of working with an incomplete reference

  6. Using de novo assembly to find variants

  7. Entire population Entire population

  8. Sample 1

  9. Sample 2

  10. Chromosome 1

  11. Using Cortex leads to a high quality set of variants

  12. Diversity in Western Chimpanzees • Similar diversity as humans of European origin (0.06%-0.08%) • Excess of common variants • 1% variants shared with humans

  13. Non-slippage indels are strongly biased to deletions 13:1 bias toward deletions. Unexpected peak at 4bp

  14. Indels as indicators of DNA repair processes Insertions deletions 25 25 20 20 Longest word agreement 15 15 10 10 5 5 5 10 15 20 25 5 10 15 20 25 Indel size Indel size

  15. TGACGAACTTAT ACTGCTTGAATA TGACGA AC AT TGAATA TGAC--AT ACTGAATA Losing GAAC TGACTTAT

  16. A tangle of trees

  17. Myers et al. 2005

  18. The zinc-finger protein PRDM9 determines hotspot location Myers et al. 2010

  19. PRDM9 Zinc fingers are radically different between humans and chimps Perhaps the most diverged gene between humans and chimpanzees Repeatedly hit by adaptive evolution across mammals Only known ‘speciation gene’ in mammals Polymorphic in humans – leads to variation in hotspots and genome instability

  20. Questions • We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees • Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding • But.. • Is there any hotspot sharing? • Do we conservation of recombination rates at any scale? • What features determine hotspot location in chimpanzees?

  21. The first genome-wide fine-scale map of recombination for a non-reference organism Auton et al. 2012

  22. Chimpanzee recombination is dominated by hotspots in a manner similar to humans

  23. But the hotspots are not in the same locations

  24. Fine-scale profiles around genes are similar

  25. As is rate variation around CpG islands

  26. Substantial PRDM9 diversity, but overlap in predicted binding sequences

  27. No signal for predicted binding sequences

  28. Similarities at 1Mb scale

  29. Human and chimp recombination rates are correlated at the chromosomal scale

  30. Human and chimp recombination rates are only correlated at broad scales

  31. Lower correlation in structural rearrangements • All,bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute • Change in proximity to telomere is important

  32. A natural experiment: chromosomal fusion 2b 2a C.A. t human chimp 2b 2 2a

  33. Fusion region shows 3-fold decrease in recombination rate

  34. Fusion region shows 3-fold decrease in recombination rate

  35. A tangle of histories

  36. Distribution of sickle allele Of malaria

  37. How many variants are shared through descent?

  38. Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals) Chimpanzee polymorphism 3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytesverus(10 individuals) SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome) reduce recurrent mutation identify potentially functional coding variants Human-chimpanzee shared coding SNPs Human-chimpanzee shared haplotypes At least two shared SNPs in 4kb with the same LD reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bpmappability, with high read depth, or not found in 1000 Genomes Phase 1 130 regions with shared haplotypes outside the MHC 135 shared non-synonymous SNPs 1 shared premature stop SNP 200 shared synonymous SNPs outside the MHC 8 with more than two pairs in LD 7 resequenced using Sanger sequencing

  39. Outside of the MHC, six clear-cut cases of trans-species polymorphisms FREM3/GYPE MTRR IGFBP7 All non-coding and putatively regulatory

  40. In intron of IGFBP7 IGFBP7 gene structure 4kb Human-Chimpanzee shared SNPs Chromatin state segmentation by HMM Weak enhancer Strong enhancer Strong enhancer Weak enhancer Weak enhancer 20kb DNaseI hypersensitive sites Open chromatin by FAIRE TFBS conserved in human/mouse/rat SRF CUTL1 RelA Bach1 ISGF-3 TFBS identified by ChIP-seq GATA-2 Primate phastCons score Regulatory region in HUVEC Regulatory region in NHEK and HMEC Average pairwise differences STAT3

  41. In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms. • None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions. • What do these regions have in common?

  42. SNPs shared by humans and chimpanzees Shared coding SNPs Shared haplotypes Glycoproteins Glycoproteins Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10-5, FDR=0.03) Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20) Enrichment of membrane glycoproteins -> host-pathogen interactions

  43. Project Participants • University of Oxford Adam Auton Rory Bowden Peter Humburg ZamIqbal GertonLunter Julian Maller Simon Myers Susanne Pfeifer Isaac Turner Oliver Venn Peter Donnelly (PI) Gil McVean (PI) • Biomedical Primate Research Centre Ronald Bontrop • University of Chicago AdiFledel-Alon Ryan Hernandez (UCSF) Ellen Leffler Cord Melton Laure Segurel Molly Przeworski (PI) • Funders Howard Hughes Medical Institute National Institute of Health Royal Society Wellcome Trust

  44. Where next?

  45. Remarkable structural and sequence diversity in chimp PRDM9

  46. Variation greater than in human populations

More Related