510 likes | 1.22k Views
Copy Number Variation. Eleanor Feingold University of Pittsburgh March 2012. GCTC ATATATAT TTG. kb - Mb (gene or gene region). What do we mean by “ copy number variation? ”. “ normal ”. deletion. duplication of one gene. duplication of several genes. duplication of part of a gene.
E N D
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012
GCTCATATATATTTG kb - Mb (gene or gene region) What do we mean by “copy number variation?”
“normal” deletion duplication of one gene duplication of several genes duplication of part of a gene Copy number variation in a gene or gene region
Classical copy number study types Cancer genetics Clinical pediatrics What Find chromosomal segments (usually large ones) that are duplicated and/or deleted in tumor cell lines Why Learn something about cancer biology or Implications for treatment and prognosis What Detect inherited or de novo deletions in individuals Why “Diagnose” birth defects
1) Collect cases and controls. 2) “Genotype” everyone at a CNV. 0 4 2 1 2 5 0 1 1 4 1 16 3 2 1 2 0 3) Test genotype/phenotype association. And now:Genetic association studies for CNVs
Generation 1 - Array CGH What Microarray of clones (e.g. BACs) Usually on glass slide Competitive hybridization of test and reference samples. Measure fluorescence ratio clone by clone. Limitations Large clones. Sparse coverage. High noise due to spotting process.
Generation 2 - SNP chips What High-throughput SNP genotyping platforms (e.g. Affymetrix, Illumina) Disadvantages Technology was never intended for measuring copy number. SNPs on chip selected to avoid CNV regions by design. Advantage Hundreds of thousands of points of info.
Generation 3 - SNP chips with CNV markers (Affy 6.0, Illumina 1M) Advantages SNPs in known CNV regions are now included. Also have “non-polymorphic SNPs” (SNs?) Illumina 1M markers in 10K regions of various types and sizes Affymetrix 200K probes in 5K known large CNV regions 700K probes “evenly spaced along the genome”
Generation 4 - (Illumina 2.5M, 5M) Changes Got rid of the non-polymorphic markers. Special coverage of CNV regions??? Are these better or worse for CNVs than the previous generation?
What data do these technologies give us, and how do we use it?
Standard genotyping Genotype information is in the angle (relative intensity of the two alleles). Copy number information is in the distance from the origin (total intensity). BB AB AA
In theory AAB AAA ABB AA AB A BB BBB B null
But when you look at the data … AAA and AA trisomic (Down Syndrome) AAB disomic AB ABB BBB and BB
disomic trisomic total intensity (trisomic) total intensity total intensity (disomic) All SNPs on chromosome 21
In theory AAB AAA ABB AA AB A BB BBB B null
In practice A B null
So how are copy numbers called? Look for runs of SNPs that are high or low in intensityMany available algorithms e.g. HMM, CBS, change-point
Komura et al. Genome Research 2006
More complex examples (cancer genetics) Peiffer et al. Genome Research, 2006
amplification total intensity Angle (genotype info) AA AB BB
deletion deletion
total intensity high over whole chromosome 3 genotype groups Extra copy of whole chromosome
LOH No copy number change, but a region of homozygosity (LOH)
Basic picture Wang et al. Genome Research, 2007
A few statistical issues to think about … (there’s still a lot to do)
Many run-calling algorithms are oriented towards clinical applications. Many CNV detection algorithms are very conservative - aim for zero false positive rate. Most use normalization methods that assume a large reference population is not available. Many use models that make assumptions about what kinds of variation are likely (e.g. cancer).
Family data should be modeled together. CNV “calls” will be much more accurate if you use the whole family, but the model you use should depend on whether you are expecting de novo mutations or not. For some diseases you’ll expect associations with de novo changes. For others you might expect inherited variants.
deletion deletion deletion deletion duplication How do we group CNVs for association testing?
Separate methods for deletions? Deletions are easier to detect than other changes. Deletions are likely to have simpler biological effects.
The most important one … The technology is still NOT intended for reliably and comparably measuring total intensity! Total intensity numbers are very sensitive to DNA source, sample handling, etc., so extreme measures must be taken to ensure that cases and controls are comparable.