1 / 60

SNP Resources and Applications

SNP Resources and Applications. SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu. http://pga.gs.washington.edu. Strategies for Genetic Analysis. Populations Association Studies. Families Linkage Studies. C. /. C. C. /. T. C. /. C. C. /.

ankti
Download Presentation

SNP Resources and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu http://pga.gs.washington.edu

  2. Strategies for Genetic Analysis Populations Association Studies Families Linkage Studies C / C C / T C / C C / C C / T C / C C / T C / C 4 0 % T , 6 0 % C 1 5 % T , 8 5 % C C / T C / C C / T C / T C / C C / C C a s e s C o n t r o l s Simple Inheritance Complex Inheritance Multiple Genes Single Gene Common Variants Rare Variants ~1,000 Short Tandem Repeat Markers and now 3,000 SNPs Polymorphic Markers > 500,000 -1,000,000 Single Nucleotide Polymorphisms (SNPs)

  3. Complex inheritance/disease Many Other Genes Variant Gene Environment Disease Diabetes Heart Disease Schizophrenia Obesity Multiple Sclerosis Celiac Disease Cancer Asthma Autism Two hypotheses: 1- common disease/common variants 2- common disease/many rare variants

  4. Genetic Strategy - New Insights STRONG LINKAGE ASSOCIATION effect size Genome-wide Sequencing WEAK LOW HIGH allele frequency Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309 Zondervan & Cardon (2004) Nat. Genet. Rev. 5: 89-100

  5. Finding SNPs - Strategies

  6. Total sequence variation in humans Population size: 6x109 (diploid) Mutation rate: 2x10–8 per bp per generation Expected “hits”: 240 for each bp - Every variant compatible with life exists in the population BUT most are vanishingly rare in the population! Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature409:928 - 933 (2001)

  7. Draft Human Genome GTTACGCCAATACAGGATCCAGGAGATTACC SNP Discovery: HapMap and others Generate more SNPs: Random Shotgun Sequencing Genomic DNA (multiple individuals) Sources of SNPs: Perlegen SNP data Sequence chromatograms from Celera project HapMap Random Shotgun Sequence and align (reference sequence) TACGCCTATA TCAAGGAGAT dbSNP 127 - 11.8 Million SNPs and 5.7 Million SNPs Validated

  8. Genomic mRNA RT errors BAC Library RRS Library cDNA Library Sequencing Quality BAC Overlap Shotgun Overlap EST Overlap G C Validated SNPs - two independent discoveries Finding SNPs: Sequence-based SNP Mining DNA SEQUENCING Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC

  9. 1.0 8 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) SNP discovery is dependent on your sample population size { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes

  10. Candidate Gene Resource

  11. SNP Discovery in SeattleSNPs Complete analysis: cSNPs, Linkage Disequilbrium and Haplotype Data 5’ 3’ Arg-Cys Val-Val PCR amplicons • Generate SNP data from complete genomic resequencing • (i.e., 5’ regulatory, exon, intron, 3’ regulatory sequence)

  12. 1.0 SeattleSNPs 96 48 24 16 HapMap Based on ~ 6 chromosomes 8 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) Increasing Sample Size Improves SNP Discovery { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes

  13. SNPs in the Average Gene Average Gene Size - 25 kb ~ Compare 2 haploid - 1 in 1,000 bp ~150 SNPs (200 bp) - 15,000,000 SNPs ~ 50 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs (33-40%) ~ 5 coding SNPs (half change the amino acid sequence) Crawford et al Ann Rev Genomics Hum Genet 2005;6:287-312

  14. SeattleSNPs panel HapMap Integration (~4 million SNPs) High Density Genic Coverage (SeattleSNPs) Low Density Genome Coverage (HapMap) = SeattleSNPs discovery (1/188 bp) = HapMap SNPs (~1/1000 bp)

  15. Sequence Variation and the HapMap

  16. Summary: The Current State of SNP Resources • Random SNP discovery generates many SNPs (HapMap) • Random approaches to SNP discovery have reached limits of discovery and validation (~ 50% of the common SNPs) • Resequencing approaches continue to catalog important variants (rare and common not captured by the HapMap) • SeattleSNPs has generated SNP data across >300 key candidate genes

  17. NHLBI - Candidate Genes and Medical Resequencing http://rsng.nhlbi.nih.gov/scripts/index.cfm

  18. Typing SNPs: Approaches

  19. HapMap Project: Genotype validated SNPs in the dbSNP • Genotype SNPs in Four populations: Initially 1 Million -> Now 4 Million • CEPH (CEU) (Europe - n = 90, trios) • Yoruban (YRI) (Africa - n = 90, trios) • Japanese (JPT) (Asian - n = 45) • Chinese (HCB) (Asian - n =45) • To produce a genome-wide map of common variation

  20. Genotyping Adds Value to SNPs HapMap Genotyping • Confirms a SNP as “real” and “informative” • Determines Minor Allele Frequency (MAF) - - common or rare • Determines MAF in different populations • Detection of SNP correlations - (Linkage Disequilibrium and Haplotypes)

  21. Genotype correlations among SNPs decreases the number of SNPs that need to be genotyped

  22. 46 common • SNPs • (> 10%MAF) • Homozygote common • Heterozygote • Homozygote alternative allele • Missing Data An Example of SNP Correlation in the Human IL1A Gene • IL1A in Europeans • 18.5 kb • 50 SNPs Carlson et al. (2004) Am J Hum Genet. 74: 106-120.

  23. 46 Common SNPs reduces to 3 SNPs - Select one SNP per bin using LDSelect • Threshold LD: r2 • Bin 1: 22 sites • Bin 2: 18 sites • Bin 3: 5 sites • Genotype 1 SNP from each bin • TagSNP, chosen for biological intuition or ease of assay design

  24. Common Variants - LD (Association) Patterns - Not the same in all genes for all populations SNPs > 10% MAF All SNPs African- American European- American

  25. How do I pick TagSNPs?

  26. http://gvs.gs.washington.edu/GVS/ TagSNPs for any gene - Use GVS

  27. TagSNPs in any Gene

  28. TagSNPs for a gene for typing multiple populations

  29. TagSNPs for a gene for typing multiple populations

  30. TagSNPs in a pathway of genes

  31. Human Association Studies

  32. C-Reactive Protein (CRP) • Pentamer belonging to pentraxin family • Acute-phase protein produced by the liver in response to cytokine production (IL-6, IL-1, tumor necrosis factor) • Non-specific response to inflammation, infection, tissue damage Well designed candidate gene studies have provided significant insights and these have been replicated in genome-wide association studies

  33. CRP Analysis • CRP is an independent risk factor for CVD • CRP levels are heritable (~40% in FHS) • Several reported SNPs alter CRP levels

  34. tagSNP selection for CRP 6 “cosmopolitan” tagSNPs 1 rare synonymous SNP Synonymous SNP (2667) “Promoter” SNPs (790, 1440) Intron SNP (1919) Downstream SNPs (3872, 5237) 3’ UTR SNP (3006)

  35. Association between CRP SNPs and Serum CRP Levels CARDIA - Carlson et al Am J Hum Genet 77: 64-77, 2005 NHANES- Crawford et al Circulation 114: 2458-65, 2006 CHS - Lange et al JAMA 296: 2703-11, 2006 Framingham - Larson et al Circulation 113: 1415-23, 2006 Other - Szalai et al J. Mol Med 83: 440-7, 2005

  36. High CRP Associated with SNPs in USF1 Binding Site • USF1 (Upstream Stimulating Factor) • Polymorphism at 1421 alters another USF1 binding site 1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5 gcagctacCACGTGcacccagatggcCACTTGtt H6gcagctacCACATGcacccagatggcCACTTGtt SNP Alters Expression In Vitro Altered Gel Shift in Vitro Genome-wide studies lead to regional and candidate genes studies

  37. Genome-WideAssociationStudies

  38. Genome-Wide Platforms Illumina TagSNPs Affymetrix Random SNPs 100,000 or 500,000 Quasi-Random SNPs 100,000, 317,000, 550,000, 650,000Y SNPs 1 Million Products are here!

  39. Genome-wide Tour de force Nature 447: 661-678 Read all the supplemental materials too!

  40. Applying HapMap - Will it work? YES!! • Hits: • Macular Degeneration, Obesity, Cardiac Repolarization, • Inflammatory Bowel Disease, Diabetes T1 and T2, Coronary Artery Disease, Rheumatoid Arthritis, Breast Cancer, Colon Cancer ….. • There are misses as well unclear why - Phenotype, Coverage, Environmental Contexts? • Example of a miss - Hypertension • -There are lots more hits in these data sets - sample size, low proxy coverage with other SNPs ….. • Analysis of associations between phenotype(s) and even individual sites is daunting and this will just be the first stage, • and this does even consider multi-site interactions

  41. How robust are the new genome- wide platforms? How well do they capture common SNPs?

  42. LD-based coverage of Sequence Variation MAF > 0.05 Bhangale et al, unpublished

  43. How can I get more information about a reference SNP (rs) identified from an association study?

  44. http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number

  45. Structural Variation

  46. Structural Variation - Large Insertion-Deletion Events • Structural Variants Identified in the HapMap • Conrad, et al. (Nature Genetics 38:75-81, 2006) • Hinds, et al. (Nature Genetics 38:82-85, 2006) • McCarroll, et al. (Nature Genetics 38:86-92, 2006) ~ 1,500 indels Lots more of them - this was only a start

  47. More than 10% of the genome sequence New Variation to Consider - Structural Variation Types of Structural Variants Insertions/Deletions Inversions Duplications Translocations Size: Large-scale (>100 kb) intermediate-scale (500 bp–100 kb) Fine-scale (1–500 bp) Nature 447: 161-165, 2007

  48. Japanese & Chinese Yoruba • A Human Genome Structural Variation Project • Goal: Complete characterization • of normal pattern of structural variation in • 62 human genomes • Genomes have dense SNP maps (HapMap) • Select most genetically diverse individuals • 62 additional human genome projects underway CEPH Nature 447:161-165, 2007

  49. Inversions Deletion Insertion Concordant Fosmid > < > < > < < < Build35 Sequence-Based Resolution of Structural Variation Human Genomic DNA Genomic Library (1 million clones) Sequence ends of genomic inserts & Map to human genome Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)

  50. Kidd, Cooper, and Eichler - unpublished

More Related