420 likes | 753 Views
Methods in genome wide association studies. Norú Moreno. CS374:: Algorithms in Biology Professor: Serafim Batzoglou. Agenda. GWA Polymorphisms Hap Map Project Genotyping chip Integrating CNVs and SNPs Imputation
E N D
Methods in genome wide association studies.Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou
Agenda • GWA • Polymorphisms • Hap Map Project • Genotyping chip Integrating CNVs and SNPs Imputation Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
Genome-wide Association Study (GWA study or GWAS) Completion of the Human Genome Project in 2003 Examination of genetic variation across a given genome. Objective: Identify genetic associations with observable traits
GWAS Scan SNPs across many individuals to associate alleles with a particular disease Use a detected association to detect, treat and prevent the disease Pharmacogenomics.
Polymorphisms • A specific sequence variation that some individuals possess • Some variations are common, others are rare • Examples: • Blood types • Height • Skin Color • Etc…
Types of polymorphisms 1. Copy Number Variation (CNV) • Segment of DNA that are found in different numbers of copies among individuals • Substantial regions, not single nucleotides A B C A C A B B B C
Types of polymorphisms • Single Nucleotide Polymorphism (SNP) )Murray 2007(
HapMap • Two unrelated people share about 99.5% of their DNA sequence. • HapMap focuses only on common SNPs, : 1% of the population • 269 individuals, ~4M SNPs • Genotyped the individuals for these SNPs, and published the results
Genotyping chip ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT ACTGGGCTAA GCTAGCTAGT CTCGATCAAT TCGATCGACTA Probes
Genotyping chip • (Liu 2007) • (Affymetrix)
Genotyping chip • (Affymetrix)
Genotyping chip B BB (0) AB (0.5) AA (1) A
Genotyping chip • Affymetrix 100k chip set • Entire genome with 100 000 SNPs (low density). • Affymetrix 500k chip (SNP array 5.0) • Entire genome with 500 000 SNPs (high density) • Affymetrix 1M chip (SNP array 6.0) • Entire genome with 1 000 000 SNPs (very high density)
Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs (Birdsuite) Korn, et al.
Birdsuite • Take in count CNVs and SNPs :: Raw data from genotyping chip as input. • Output: integrated CNVs and SNPS genotype per locus • CNVs and SNPs coexist. • Both common and rare to understand the role of genetic variation in disease.
Birdsuite New Genotype A-null AAAB BBBB SNPs (AA, AB, CC) CNPs
Birdsuite – 4 Stages • Canary – ‘Genotypes’ common copy-number polymorphisms (CNPs) • Birdseed - Genotypes SNPs using the classical AA, AB, and BB genotypes. • Birdseye - Identify rare CNVs via HMMs • Fawkes - Integrates CNV information to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)
Birdsuite - Canary • Determines the copy number of each individual at each predefined CNP locus. • CNP = Copy number polymorphism CNV>1% frequency in population A B B B C
Canary (Korn, p.1255)
Birdsuite - Birdseed We expect only AA, AB or BB. • From canary only CNPs with 2 • No fewer or extra copies. BB AB AA (Korn, p.1257)
Birdsuite - Birdseye • Using Canary and Birdseed: • Identify rare and de novo CNVs • Small number of real CNVs at unknown sites. • Search consistent evidence for copy number variation across multiple neighboring probes. • Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states
Birdsuite - Birdseye • HMM to find regions of variable copy number in a sample. • Hidden state: The true copy number of the individual’s genome. • Observed states: The normalized intensity measurements of each probe on the array.
Birdsuite - Fawkes • Merge all the results. • Show the CNVs within each SNP. • Utilize the imputed locations (in A/B intensity space) of copy-variable clusters. • Assign an allele-specific copy number genotype at each SNP. • (e.g. AAB, ABBB, A or B)
Fawkes (Korn, p. 1254,1257)
Imputation • Dealing with missing data points by filling in values. In SNPs: • T A G G T ? T G C C T A G C G T Why? • Cost-saving • Avoid re-genotyping • Keep effective sample size • SNP comparisons between existing platforms.
Imputation • High rate of occurrence. • ‘Direct’ imputation. T A G G T ? T G C C T A G C G T T A G G T A T G C C T A G C G T
LD Imputation • Linkage disequilibrium • Non-random association of alleles at two or more loci. SNP of interest
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays Homer, et al.
The DNA Detective • Is an individual genome present in a DNA mixture? Mixed DNA // Population Query
DNA Detective • We have: • Different laboratories > different conclusions. • Usually not accurate at all. • Hard and cannot be automatized.
DNA Detective - Methodology • Summary: • Cumulative sum of allele shifts over all available SNPs. • Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture. • First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.
DNA Detective – Single SNP, Single person • Raw preprocessed data > allele instensity (How much of A and how much of B we have). • Transform normalized data into a ratio. Yi is the estimate of allele frequency BB AB AA ~0 ~0.5 ~1
DNA Detective - Methodology • Use relative probe intensity data. • Compare allele frequency estimates from the mixture (M). • Assume reference population (Pop) has similar ancestral components interchangeable.
DNA Detective - Methodology • Distance measure for individual Yi
DNA Detective - Methodology • Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0 • Alternative hypotheses, D(Yi,j) > 0 • More similar to M than Pop • D(Yi,j) < 0 • Yi,jc is more ancestral similar to Pop than to M.
DNA Detective - Results • Accurate findings. • Determined if a trace amount (<1%) of DNA is present in a DNA mixture. • Tested with different kinds of Mixtures from public available data.
DNA Detective - Implications • Forensics application. • Traceability • Leak of privacy information. • Public data from many studies. Summary statistics of Allele Frequency. • Political implications. • How to share the data now?
References • Korn J, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60 • Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167 • Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007 • Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html