1 / 36

Lessons learnt from the 1000 Genomes Project about sequencing in populations

Lessons learnt from the 1000 Genomes Project about sequencing in populations. Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford. Some questions. What has the 1000 Genomes Project told us about how to sequence (in) populations

kylia
Download Presentation

Lessons learnt from the 1000 Genomes Project about sequencing in populations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford

  2. Some questions • What has the 1000 Genomes Project told us about how to sequence (in) populations • What has the 1000 Genomes Project told us about populations

  3. CEU FIN GBR CHB TSI JPT IBS CDX CHS YRI GWB KHV LWK GHN MAB Samples for the 1000 Genomes Project ASW AJM ACB MXL PUR CLM PEL Samples from S. Asia Major population groups comprised of subpopulations of c. 100 each

  4. The role of the 1000G Project in medical genetics • A catalogue of variants • 95% of variants at 1% frequency in populations of interest • A representation of ‘normal’ variation • A set of haplotypes for imputation into GWAS • A training ground for sequencing/statistical/computational technologies

  5. Samples for the 1000 Genomes Project: Pilot CEU CHB TSI* JPT CHS* YRI LWK* *Exon pilot only

  6. Population-scale genome sequencing Haplotypes 2x 10x

  7. What has the project generated?

  8. >15 million SNPs, >50% of them novel dbSNP entries increased by 70%

  9. An huge increase in the set of structural variants

  10. A robust and modular pipeline for analysis of population-scale sequence data

  11. An efficient format for storing aligned reads and a set of tools to manipulate and view the files • SAM/BAM format for storing (aligned) reads Bioinformatics (2009) http://samtools.sourceforge.net

  12. An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files http://vcftools.sourceforge.net

  13. An understanding of the ‘rare functional variant load’ carried by individuals c. 250 LOF / person c. 75 HGMD DM

  14. USH2A • Mutations cause with Usher syndrome • 66 missense variants in dbSNP • 2/3 detected in 1000 Genomes Pilot • One HGMD ‘disease-causing’ variant homozygous in 3 YRI • Other reports indicate this is not a real disease-causing variant

  15. Samples for the 1000 Genomes Project: Phase1 CEU FIN GBR CHB ASW TSI JPT CHS YRI MXL PUR LWK CLM

  16. Lessons learnt about sequencing in populations

  17. Lesson 1. The low-coverage model works for variant discovery

  18. A near complete record of common variants CEU

  19. Lesson 2. The low coverage model works for SNP genotyping

  20. A set of accurate genotypes/haplotypes CEU

  21. Lesson 3.The genome has a large grey area where variant calling is hard

  22. Lesson 4. Joint calling of different variant types substantially improves the quality of calls

  23. Lesson 5.Managing uncertainty is important

  24. Lesson 6.Data visualisation is key

  25. Lessons learnt about populations

  26. Closely related populations can have substantially different rare variants

  27. Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants Iain Mathieson

  28. Thanks to the many... • Steering committee • Co-chairs: Richard Durbin and David Altshuler • Samples and ELSI Committee • Co-chairs: AravindaChakravarti and LeenaPeltonen • Data Production Group • Co-chairs: Elaine Mardis and Stacey Gabriel • Analysis Group • Co-Chairs: Gil McVean and Goncalo Abecasis • Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski) • Structural Variation Group • Co-chairs: Matt Hurles, Charles Lee and Evan Eichler • DCC • Co-Chairs: Paul Flicek and Steve Sherry

More Related