600 likes | 798 Views
SNP Resources and Applications. SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu. http://pga.gs.washington.edu. Strategies for Genetic Analysis. Populations Association Studies. Families Linkage Studies. C. /. C. C. /. T. C. /. C. C. /.
E N D
SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu http://pga.gs.washington.edu
Strategies for Genetic Analysis Populations Association Studies Families Linkage Studies C / C C / T C / C C / C C / T C / C C / T C / C 4 0 % T , 6 0 % C 1 5 % T , 8 5 % C C / T C / C C / T C / T C / C C / C C a s e s C o n t r o l s Simple Inheritance Complex Inheritance Multiple Genes Single Gene Common Variants Rare Variants ~1,000 Short Tandem Repeat Markers and now 3,000 SNPs Polymorphic Markers > 500,000 -1,000,000 Single Nucleotide Polymorphisms (SNPs)
Complex inheritance/disease Many Other Genes Variant Gene Environment Disease Diabetes Heart Disease Schizophrenia Obesity Multiple Sclerosis Celiac Disease Cancer Asthma Autism Two hypotheses: 1- common disease/common variants 2- common disease/many rare variants
Genetic Strategy - New Insights STRONG LINKAGE ASSOCIATION effect size Genome-wide Sequencing WEAK LOW HIGH allele frequency Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309 Zondervan & Cardon (2004) Nat. Genet. Rev. 5: 89-100
Finding SNPs - Strategies
Total sequence variation in humans Population size: 6x109 (diploid) Mutation rate: 2x10–8 per bp per generation Expected “hits”: 240 for each bp - Every variant compatible with life exists in the population BUT most are vanishingly rare in the population! Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature409:928 - 933 (2001)
Draft Human Genome GTTACGCCAATACAGGATCCAGGAGATTACC SNP Discovery: HapMap and others Generate more SNPs: Random Shotgun Sequencing Genomic DNA (multiple individuals) Sources of SNPs: Perlegen SNP data Sequence chromatograms from Celera project HapMap Random Shotgun Sequence and align (reference sequence) TACGCCTATA TCAAGGAGAT dbSNP 127 - 11.8 Million SNPs and 5.7 Million SNPs Validated
Genomic mRNA RT errors BAC Library RRS Library cDNA Library Sequencing Quality BAC Overlap Shotgun Overlap EST Overlap G C Validated SNPs - two independent discoveries Finding SNPs: Sequence-based SNP Mining DNA SEQUENCING Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC
1.0 8 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) SNP discovery is dependent on your sample population size { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes
SNP Discovery in SeattleSNPs Complete analysis: cSNPs, Linkage Disequilbrium and Haplotype Data 5’ 3’ Arg-Cys Val-Val PCR amplicons • Generate SNP data from complete genomic resequencing • (i.e., 5’ regulatory, exon, intron, 3’ regulatory sequence)
1.0 SeattleSNPs 96 48 24 16 HapMap Based on ~ 6 chromosomes 8 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) Increasing Sample Size Improves SNP Discovery { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes
SNPs in the Average Gene Average Gene Size - 25 kb ~ Compare 2 haploid - 1 in 1,000 bp ~150 SNPs (200 bp) - 15,000,000 SNPs ~ 50 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs (33-40%) ~ 5 coding SNPs (half change the amino acid sequence) Crawford et al Ann Rev Genomics Hum Genet 2005;6:287-312
SeattleSNPs panel HapMap Integration (~4 million SNPs) High Density Genic Coverage (SeattleSNPs) Low Density Genome Coverage (HapMap) = SeattleSNPs discovery (1/188 bp) = HapMap SNPs (~1/1000 bp)
Summary: The Current State of SNP Resources • Random SNP discovery generates many SNPs (HapMap) • Random approaches to SNP discovery have reached limits of discovery and validation (~ 50% of the common SNPs) • Resequencing approaches continue to catalog important variants (rare and common not captured by the HapMap) • SeattleSNPs has generated SNP data across >300 key candidate genes
NHLBI - Candidate Genes and Medical Resequencing http://rsng.nhlbi.nih.gov/scripts/index.cfm
Typing SNPs: Approaches
HapMap Project: Genotype validated SNPs in the dbSNP • Genotype SNPs in Four populations: Initially 1 Million -> Now 4 Million • CEPH (CEU) (Europe - n = 90, trios) • Yoruban (YRI) (Africa - n = 90, trios) • Japanese (JPT) (Asian - n = 45) • Chinese (HCB) (Asian - n =45) • To produce a genome-wide map of common variation
Genotyping Adds Value to SNPs HapMap Genotyping • Confirms a SNP as “real” and “informative” • Determines Minor Allele Frequency (MAF) - - common or rare • Determines MAF in different populations • Detection of SNP correlations - (Linkage Disequilibrium and Haplotypes)
Genotype correlations among SNPs decreases the number of SNPs that need to be genotyped
46 common • SNPs • (> 10%MAF) • Homozygote common • Heterozygote • Homozygote alternative allele • Missing Data An Example of SNP Correlation in the Human IL1A Gene • IL1A in Europeans • 18.5 kb • 50 SNPs Carlson et al. (2004) Am J Hum Genet. 74: 106-120.
46 Common SNPs reduces to 3 SNPs - Select one SNP per bin using LDSelect • Threshold LD: r2 • Bin 1: 22 sites • Bin 2: 18 sites • Bin 3: 5 sites • Genotype 1 SNP from each bin • TagSNP, chosen for biological intuition or ease of assay design
Common Variants - LD (Association) Patterns - Not the same in all genes for all populations SNPs > 10% MAF All SNPs African- American European- American
http://gvs.gs.washington.edu/GVS/ TagSNPs for any gene - Use GVS
Human Association Studies
C-Reactive Protein (CRP) • Pentamer belonging to pentraxin family • Acute-phase protein produced by the liver in response to cytokine production (IL-6, IL-1, tumor necrosis factor) • Non-specific response to inflammation, infection, tissue damage Well designed candidate gene studies have provided significant insights and these have been replicated in genome-wide association studies
CRP Analysis • CRP is an independent risk factor for CVD • CRP levels are heritable (~40% in FHS) • Several reported SNPs alter CRP levels
tagSNP selection for CRP 6 “cosmopolitan” tagSNPs 1 rare synonymous SNP Synonymous SNP (2667) “Promoter” SNPs (790, 1440) Intron SNP (1919) Downstream SNPs (3872, 5237) 3’ UTR SNP (3006)
Association between CRP SNPs and Serum CRP Levels CARDIA - Carlson et al Am J Hum Genet 77: 64-77, 2005 NHANES- Crawford et al Circulation 114: 2458-65, 2006 CHS - Lange et al JAMA 296: 2703-11, 2006 Framingham - Larson et al Circulation 113: 1415-23, 2006 Other - Szalai et al J. Mol Med 83: 440-7, 2005
High CRP Associated with SNPs in USF1 Binding Site • USF1 (Upstream Stimulating Factor) • Polymorphism at 1421 alters another USF1 binding site 1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5 gcagctacCACGTGcacccagatggcCACTTGtt H6gcagctacCACATGcacccagatggcCACTTGtt SNP Alters Expression In Vitro Altered Gel Shift in Vitro Genome-wide studies lead to regional and candidate genes studies
Genome-Wide Platforms Illumina TagSNPs Affymetrix Random SNPs 100,000 or 500,000 Quasi-Random SNPs 100,000, 317,000, 550,000, 650,000Y SNPs 1 Million Products are here!
Genome-wide Tour de force Nature 447: 661-678 Read all the supplemental materials too!
Applying HapMap - Will it work? YES!! • Hits: • Macular Degeneration, Obesity, Cardiac Repolarization, • Inflammatory Bowel Disease, Diabetes T1 and T2, Coronary Artery Disease, Rheumatoid Arthritis, Breast Cancer, Colon Cancer ….. • There are misses as well unclear why - Phenotype, Coverage, Environmental Contexts? • Example of a miss - Hypertension • -There are lots more hits in these data sets - sample size, low proxy coverage with other SNPs ….. • Analysis of associations between phenotype(s) and even individual sites is daunting and this will just be the first stage, • and this does even consider multi-site interactions
How robust are the new genome- wide platforms? How well do they capture common SNPs?
LD-based coverage of Sequence Variation MAF > 0.05 Bhangale et al, unpublished
How can I get more information about a reference SNP (rs) identified from an association study?
http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number
Structural Variation
Structural Variation - Large Insertion-Deletion Events • Structural Variants Identified in the HapMap • Conrad, et al. (Nature Genetics 38:75-81, 2006) • Hinds, et al. (Nature Genetics 38:82-85, 2006) • McCarroll, et al. (Nature Genetics 38:86-92, 2006) ~ 1,500 indels Lots more of them - this was only a start
More than 10% of the genome sequence New Variation to Consider - Structural Variation Types of Structural Variants Insertions/Deletions Inversions Duplications Translocations Size: Large-scale (>100 kb) intermediate-scale (500 bp–100 kb) Fine-scale (1–500 bp) Nature 447: 161-165, 2007
Japanese & Chinese Yoruba • A Human Genome Structural Variation Project • Goal: Complete characterization • of normal pattern of structural variation in • 62 human genomes • Genomes have dense SNP maps (HapMap) • Select most genetically diverse individuals • 62 additional human genome projects underway CEPH Nature 447:161-165, 2007
Inversions Deletion Insertion Concordant Fosmid > < > < > < < < Build35 Sequence-Based Resolution of Structural Variation Human Genomic DNA Genomic Library (1 million clones) Sequence ends of genomic inserts & Map to human genome Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)