1 / 41

Methods in genome wide association studies. Norú Moreno

Methods in genome wide association studies. Norú Moreno. CS374:: Algorithms in Biology Professor: Serafim Batzoglou. Agenda. GWA Polymorphisms Hap Map Project Genotyping chip Integrating CNVs and SNPs Imputation

hollis
Download Presentation

Methods in genome wide association studies. Norú Moreno

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methods in genome wide association studies.Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou

  2. Agenda • GWA • Polymorphisms • Hap Map Project • Genotyping chip Integrating CNVs and SNPs Imputation Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

  3. Genome-wide Association Study (GWA study or GWAS) Completion of the Human Genome Project in 2003 Examination of genetic variation across a given genome. Objective: Identify genetic associations with observable traits

  4. GWAS Scan SNPs across many individuals to associate alleles with a particular disease Use a detected association to detect, treat and prevent the disease Pharmacogenomics.

  5. Polymorphisms • A specific sequence variation that some individuals possess • Some variations are common, others are rare • Examples: • Blood types • Height • Skin Color • Etc…

  6. Types of polymorphisms 1. Copy Number Variation (CNV) • Segment of DNA that are found in different numbers of copies among individuals • Substantial regions, not single nucleotides A B C A C A B B B C

  7. Types of polymorphisms • Single Nucleotide Polymorphism (SNP) )Murray 2007(

  8. HapMap • Two unrelated people share about 99.5% of their DNA sequence. • HapMap focuses only on common SNPs, : 1% of the population • 269 individuals, ~4M SNPs • Genotyped the individuals for these SNPs, and published the results

  9. Genotyping chip ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT ACTGGGCTAA GCTAGCTAGT CTCGATCAAT TCGATCGACTA Probes

  10. Genotyping chip • (Liu 2007) • (Affymetrix)

  11. Genotyping chip • (Affymetrix)

  12. Genotyping chip B BB (0) AB (0.5) AA (1) A

  13. Genotyping chip • Affymetrix 100k chip set • Entire genome with 100 000 SNPs (low density). • Affymetrix 500k chip (SNP array 5.0) • Entire genome with 500 000 SNPs (high density) • Affymetrix 1M chip (SNP array 6.0) • Entire genome with 1 000 000 SNPs (very high density)

  14. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs (Birdsuite) Korn, et al.

  15. Birdsuite • Take in count CNVs and SNPs :: Raw data from genotyping chip as input. • Output: integrated CNVs and SNPS genotype per locus • CNVs and SNPs coexist. • Both common and rare to understand the role of genetic variation in disease.

  16. Birdsuite New Genotype A-null AAAB BBBB SNPs (AA, AB, CC) CNPs

  17. Birdsuite – 4 Stages • Canary – ‘Genotypes’ common copy-number polymorphisms (CNPs) • Birdseed - Genotypes SNPs using the classical AA, AB, and BB genotypes. • Birdseye - Identify rare CNVs via HMMs • Fawkes - Integrates CNV information to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)

  18. Birdsuite - Canary • Determines the copy number of each individual at each predefined CNP locus. • CNP = Copy number polymorphism CNV>1% frequency in population A B B B C

  19. Canary (Korn, p.1255)

  20. Birdsuite - Birdseed We expect only AA, AB or BB. • From canary only CNPs with 2 • No fewer or extra copies. BB AB AA (Korn, p.1257)

  21. Birdsuite - Birdseye • Using Canary and Birdseed: • Identify rare and de novo CNVs • Small number of real CNVs at unknown sites. • Search consistent evidence for copy number variation across multiple neighboring probes. • Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states

  22. Birdsuite - Birdseye • HMM to find regions of variable copy number in a sample. • Hidden state: The true copy number of the individual’s genome. • Observed states: The normalized intensity measurements of each probe on the array.

  23. Birdsuite - Fawkes • Merge all the results. • Show the CNVs within each SNP. • Utilize the imputed locations (in A/B intensity space) of copy-variable clusters. • Assign an allele-specific copy number genotype at each SNP. • (e.g. AAB, ABBB, A or B)

  24. Fawkes (Korn, p. 1254,1257)

  25. (Affymetrix website screenshot)

  26. Imputation • Dealing with missing data points by filling in values. In SNPs: • T A G G T ? T G C C T A G C G T Why? • Cost-saving • Avoid re-genotyping • Keep effective sample size • SNP comparisons between existing platforms.

  27. Imputation • High rate of occurrence. • ‘Direct’ imputation. T A G G T ? T G C C T A G C G T T A G G T A T G C C T A G C G T

  28. LD Imputation • Linkage disequilibrium • Non-random association of alleles at two or more loci. SNP of interest

  29. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays Homer, et al.

  30. The DNA Detective • Is an individual genome present in a DNA mixture? Mixed DNA // Population Query

  31. DNA Detective • We have: • Different laboratories > different conclusions. • Usually not accurate at all. • Hard and cannot be automatized.

  32. DNA Detective - Methodology • Summary: • Cumulative sum of allele shifts over all available SNPs. • Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture. • First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.

  33. DNA Detective – Single SNP, Single person • Raw preprocessed data > allele instensity (How much of A and how much of B we have). • Transform normalized data into a ratio. Yi is the estimate of allele frequency BB AB AA ~0 ~0.5 ~1

  34. DNA Detective - Methodology • Use relative probe intensity data. • Compare allele frequency estimates from the mixture (M). • Assume reference population (Pop) has similar ancestral components interchangeable.

  35. DNA Detective - Methodology • Distance measure for individual Yi

  36. DNA Detective - Methodology • Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0 • Alternative hypotheses, D(Yi,j) > 0 • More similar to M than Pop • D(Yi,j) < 0 • Yi,jc is more ancestral similar to Pop than to M.

  37. (Homer, p.4)

  38. DNA Detective - Results • Accurate findings. • Determined if a trace amount (<1%) of DNA is present in a DNA mixture. • Tested with different kinds of Mixtures from public available data.

  39. DNA Detective - Implications • Forensics application. • Traceability • Leak of privacy information. • Public data from many studies. Summary statistics of Allele Frequency. • Political implications. • How to share the data now?

  40. Thank You!

  41. References • Korn J, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60 • Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167 • Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007 • Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html

More Related