1 / 20

CGH Data

CGH Data. BIOS 691-804. Chromosome Re-arrangements. Normal Human Variation. Array CGH Technology. Chromosome 8 (241 genes) in 10 cell lines and many tumor samples. Pre-processing CGHa Data. QA: Same as for expression Normalization Are values comparable across arrays?

Download Presentation

CGH Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGH Data BIOS 691-804

  2. Chromosome Re-arrangements

  3. Normal Human Variation

  4. Array CGH Technology

  5. Chromosome 8 (241 genes) in 10 cell lines and many tumor samples

  6. Pre-processing CGHa Data • QA: Same as for expression • Normalization • Are values comparable across arrays? • Can noise be reduced? • Segmentation • Where do copy number aberrations start and stop? • Better estimates for how many copies

  7. Normalization • Most copy numbers are 2 • Centering necessary • Dynamic range varies • Mixtures of tumor with normal • Saturation not usually a problem • Few instances of 10X copy • Dye bias sometimes strong • loess procedure unreliable

  8. Centering • Where is the center (log ratio 0)? • Sometimes modal copy number is 3 • Variability in labeling and tissue extraction • CGH can’t give direct measures of counts • Most researchers set modal copy to log-ratio of 0 • Does it matter? • Take 3 as equivalent to 2 for comparison?

  9. Dynamic Range • Ratios of signal are often less (sometimes much less) than actual ratios of copy numbers between samples From Bilke et al, Bioinformatics, 2005

  10. Fractional Copy Numbers • Often samples are mixtures of tumor and normal • Many tumors have two (or more) distinct clones with distinct karyotypes • Observed copy numbers may lie in between values corresponding to whole numbers

  11. Probe Bias • If errors are random then plot of self vs self ratios should be random • Actual Corr > 60% • Clear bias! • Try to estimate it

  12. Segmentation • Individual probe values are noisy • Most aberrations are segments • Most segments have many probes • Average neighboring probe values to better estimate segment value – how far?

  13. Segmentation • Issues: • How to identify where a segment starts or stops • How to find these points efficiently

  14. Noise and Signal

  15. How to Find Segments? • Could be large copy number change over short interval or small change over large • Look for jumps in running averages • Distribution of jumps between probes • DNACopy is Maximum Likelihood estimate of change points, using all intervals • StepGram is efficient computation of (subset of) t-scores

  16. Theory • Classical change-point test statistic • Let be values; let be partial sums • Set , where • are the differences in levels before and after i • Now for segments ‘in middle’ • Let , where • This is “Circular Binary Segmentation” • Implemented in DNACopy

  17. DNACopy • In Bioconductor • Does ML identification of segments recursively • Apply procedure within identified segments • Double-checks points near the boundary • Does permutation testing to estimate null distribution • Often data are not Normal

  18. StepGram • DNACopy is slow! • Could try to compute only a fraction of possible scores • StepGram tries to find a subset of most likely scores to compute • Much faster! • Some inaccuracies • Doesn’t handle chromosome ends well

  19. StepGram – Method 1 • Key Idea: • Don’t compute • all possible t-scores • Compute only those • likely to show • significant change • Bound the • estimated t-scores • in future based on • current t-scores

  20. StepGram – Algorithm 2

More Related