CGH Data

CGH Data BIOS 691-804

Chromosome Re-arrangements

Normal Human Variation

Array CGH Technology

Chromosome 8 (241 genes) in 10 cell lines and many tumor samples

Pre-processing CGHa Data • QA: Same as for expression • Normalization • Are values comparable across arrays? • Can noise be reduced? • Segmentation • Where do copy number aberrations start and stop? • Better estimates for how many copies

Normalization • Most copy numbers are 2 • Centering necessary • Dynamic range varies • Mixtures of tumor with normal • Saturation not usually a problem • Few instances of 10X copy • Dye bias sometimes strong • loess procedure unreliable

Centering • Where is the center (log ratio 0)? • Sometimes modal copy number is 3 • Variability in labeling and tissue extraction • CGH can’t give direct measures of counts • Most researchers set modal copy to log-ratio of 0 • Does it matter? • Take 3 as equivalent to 2 for comparison?

Dynamic Range • Ratios of signal are often less (sometimes much less) than actual ratios of copy numbers between samples From Bilke et al, Bioinformatics, 2005

Fractional Copy Numbers • Often samples are mixtures of tumor and normal • Many tumors have two (or more) distinct clones with distinct karyotypes • Observed copy numbers may lie in between values corresponding to whole numbers

Probe Bias • If errors are random then plot of self vs self ratios should be random • Actual Corr > 60% • Clear bias! • Try to estimate it

Segmentation • Individual probe values are noisy • Most aberrations are segments • Most segments have many probes • Average neighboring probe values to better estimate segment value – how far?

Segmentation • Issues: • How to identify where a segment starts or stops • How to find these points efficiently

Noise and Signal

How to Find Segments? • Could be large copy number change over short interval or small change over large • Look for jumps in running averages • Distribution of jumps between probes • DNACopy is Maximum Likelihood estimate of change points, using all intervals • StepGram is efficient computation of (subset of) t-scores

Theory • Classical change-point test statistic • Let be values; let be partial sums • Set , where • are the differences in levels before and after i • Now for segments ‘in middle’ • Let , where • This is “Circular Binary Segmentation” • Implemented in DNACopy

DNACopy • In Bioconductor • Does ML identification of segments recursively • Apply procedure within identified segments • Double-checks points near the boundary • Does permutation testing to estimate null distribution • Often data are not Normal

StepGram • DNACopy is slow! • Could try to compute only a fraction of possible scores • StepGram tries to find a subset of most likely scores to compute • Much faster! • Some inaccuracies • Doesn’t handle chromosome ends well

StepGram – Method 1 • Key Idea: • Don’t compute • all possible t-scores • Compute only those • likely to show • significant change • Bound the • estimated t-scores • in future based on • current t-scores

StepGram – Algorithm 2

CGH Data

CGH Data

Presentation Transcript

Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso

CGH, ARRAY-CGH

Comparative Genomic Hybridization (CGH)

Array CGH

Interval Scores for Quality Annotated CGH Data

Modelling of CGH arrays experiments

Custom High-Definition CGH (HD-CGH) Microarray

Mouse Genome CGH Microarray 44A

GRPE 017 17-02-03 GRPE CGH 2 Experts

Lecture 2 Microarray and a-CGH Data Analysis Bioinformatics Data Analysis and Tools

Gene identification by whole genome array CGH

Obesity: genetic update by CGH analysis and its potential clinical implications

CGH 306 Supervised Field Training in Public Health

Array CGH diagnosing developmental disorders

Algorithms for Smoothing Array CGH data

DIAL Micro Array CGH Database

ERROR ANALYSIS FOR CGH OPTICAL TESTING

Classification and Feature Selection Algorithms for Multi-class CGH data