1 / 57

Factors to Consider in Selecting a Genotyping Platform

Factors to Consider in Selecting a Genotyping Platform. Elizabeth Pugh June 22, 2007. GWA Studies. Genotype 300,000 to 1,000,000 SNPs 3 platforms, multiple products Affymetrix Illumina Perlegen How to choose?. What I can cover. Basics of calling genotypes

betty_james
Download Presentation

Factors to Consider in Selecting a Genotyping Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007

  2. GWA Studies • Genotype 300,000 to 1,000,000 SNPs • 3 platforms, multiple products • Affymetrix • Illumina • Perlegen • How to choose?

  3. What I can cover • Basics of calling genotypes • Examples of good and bad data • Some things to consider

  4. Basics of how it works • Skipping chemistry… • Generate intensity data for 2 alleles • Assign genotypes based on clustering • These are ‘phenotypes’ – there is measurement error • No manual review of data – too many SNPs

  5. A good SNP

  6. Same SNP different view

  7. Same SNP different view

  8. Another Good SNP

  9. And Another Good SNP

  10. Data Quality • Most of the data is good for all platforms • Some samples, SNPs and genotypes fail • Have to find them without manual review

  11. Ways to find bad data • Use summary statistics across SNPs, samples • Include investigator and control replicates • Include control and where possible investigator trios • If use Hapmap controls can compare with caution to Hapmap genotypes – there are some errors in Hapmap data

  12. Finding Bad SNPs • Use qc checks • Call rate • Mendelian Inheritance • Replicates • HWE • Quality score, clustering • Note some bad SNPs will pass any qc filter • Some good SNPs may fail qc

  13. Bad SNP caught by qc filter

  14. Bad SNP caught by qc filter

  15. Bad SNP caught by qc filter

  16. Bad SNP caught by qc filter

  17. Bad SNP caught by qc filter

  18. Bad SNP caught by qc filter

  19. Bad SNP caught by qc filter

  20. Bad SNP caught by qc filter

  21. Bad SNP caught by qc filter

  22. Yikes! Some of those are awful! • Yes • We can find many, hopefully most of them but… • Use the intensity data to plot your most significant SNPs • Look at them before you publish

  23. Use a lab that will give you intensity data • If you have intensity data you can • Plot the intensities to check clustering • Cluster with a different algorithm • Recluster as algorithms get better • Recluster subsets or supersets of the data • Create your own metrics (e.g. number of samples with no or very low intensity)

  24. Finding Bad Samples • Look at sample level metrics starting with call rate • Bad samples - even water will have some genotypes • May want to remove possibly bad sample before clustering the data then make final sample decisions

  25. Sample plotall SNPs for one sample sample call rate 99.8%

  26. Sample plot – Failed samplelow intensity Call freq 41%

  27. Failed samples tend to fall outside of clusters for many SNPs

  28. Failed samples tend to fall outside of clusters for many SNPs

  29. Can I use WGA samples? • Whole Genome Amplified DNA performance ranges from awful to very good • Even WGA samples that work very well may perform poorly for some SNPs • Extra attention needed for clustering decisions and for analysis • Make sure lab knows sample type for each sample

  30. WGA clustering with other samples

  31. WGA lower intensityCall freq 98%

  32. WGA failurecall rate 93%

  33. Multiple sample types in study • Look at data by sample type (metrics and plots) • If they are not performing equivalently do lots of extra qc by sample type • If have to cluster separately even more qc and checks are needed • If sample type is not random may cause more headaches (e.g. different types for cases and controls)

  34. Preventing Bad Data • Discuss sample types with lab what is their experiece? May want to test some before start project • Discuss plating with lab may wish to place controls uniquely or arrange males and females uniquely by plate

  35. Preventing Bad Data • Differences in intensity (batch effects) are not common but possible • May only be present for subset of SNPs • May want to mix cases and controls across plates to minimize effect of plate effect if it happens

  36. Genotypes • For good SNPs and samples some genotypes will fail • May not be called • May be called with low confidence or quality score • May be called wrong

  37. 1 genotype not called

  38. 1 wrong genotype

  39. Copy number • With Affymetrix and Illumina intensity information can be used to infer copy number • Works very well with small numbers of samples and manual review • Not really a high throughput system – software not sensitive or specific enough … Yet

  40. Genome viewer

  41. Female Chr X

  42. Male chrX

  43. Known Frequent CNV chr 10

  44. Known Frequent CNV chr 10

  45. Your study Population Study design Sample types Combining data with other studies Interest in CNV’s Product Coverage of the genome How many SNPs Which SNPs (tagging, in or near genes) Quality of data Performance on your sample types Information on CNV’s Choosing a Platform and ProductFactors to Consider

  46. Comparing PlatformsMake sure the numbers are comparable! • QC rates reported – denominators can differ • Mendel errors per trio or per sample • Replicate errors per pair or per sample

  47. Comparing PlatformsMake sure the numbers are comparable! • SNPs on the chip are correlated with many others – often very strong correlation • There are multiple • measures of the strength of the correlation • Lists of SNPs to use as proxy for ‘Genome’

  48. Cost? • Hard to say • Changing rapidly • Generally increase with the numbers of SNPs on a chip • May decrease with number of samples in a study • Reagents (the chips) are only part of the cost

  49. New Stuff!

  50. New GWA ArraysAffymetrix and Illumina • ~ 1 million SNPs • Enhanced copy number content • Different strategies • Improved coverage in YRI population • Illumina 1M – still pre-release • Same chemistry, same software, same probe designs, same lab workflow as other Infinium products • Affymetrix 6.0 – just released • Same chemistry & lab workflow as 5.0 • Changes in probe design & software

More Related