1 / 24

Multi-Sample analysis of microarray based copy-number aberration data

Multi-Sample analysis of microarray based copy-number aberration data. Copy Number Detection Meeting. March 6, 2006. Gregory R. Grant ggrant@pcbi.upenn.edu Mitchell Guttman mguttman@sas.upenn.edu. Motivating Framework.

jennis
Download Presentation

Multi-Sample analysis of microarray based copy-number aberration data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Sample analysis of microarray based copy-number aberration data Copy Number Detection Meeting March 6, 2006 Gregory R. Grant ggrant@pcbi.upenn.edu Mitchell Guttman mguttman@sas.upenn.edu

  2. Motivating Framework • The ability to map the location and magnitude of aberrations is important. • Aberration regions can be small. • We are interested in regions of copy number aberrations (CNA) that are recurrent across a class of samples. • Myc-N Amplification in high risk Neuroblastoma. • ErbB2 Amplification in higher risk Breast Cancer. • Both of these are highly correlated with prognosis.

  3. Single Slide Methods • There are numerous single slide methods for determining aberration within an array. • These methods use multiple elements in a region as replicates for determining aberration in the region. • With a single slide this is the best one can do. • The resolution of detection is lower than the resolution of the array. • With multiple slides we can take a different strategy. • Be more liberal on the single slide calls. • Only believe the calls when we see them replicated across samples significantly often. • Finally, while there may be aberration present within a single array that is not present across samples, this aberration is unlikely to be due to a population effect.

  4. Multiple Sample Analysis (MSA) • The ability to use multiple samples as replication we are able to characterize the genomic aberrations at a higher resolution (at the resolution of the array). This also allows us to identify regions of importance to the population. • Use Information from multiple samples to find aberrations characteristic to the class of samples. • Rather than looking across the genome, we look across experiments at each location. • This allows us to pickup small regions of tight concordance regardless of their small size within a single experiment.

  5. STAC Statistical Algorithm • Given a set of calls STAC finds aberrations which are significantly concordant across samples. • STAC provides two statistical tests of significance, the footprint and frequency. • Frequency measures the number of samples that overlap a particular clone. • Footprint measures how tight the overlap is. Footprint 7 Footprint 4 Frequency = 5 in both cases. http://www.cbil.upenn.edu/STAC/

  6. Motivating Dataset (Mies Lab) • Fixed Paraffin Embedded (FFPE) Sample DNA. • Challenging case • Laser Captured Micro-dissected samples from FFPE, archived (10+ years), degraded tissue, with no exact normal analog. • Indirectly labeled samples due to small quantity of DNA. Due to a need for sufficient amplification • Amplification based on human specific degenerate oligo primers. • 2-Channel BAC Arrays made by the Penn Microarray Core based on the Weber library.

  7. Making Calls and Processing Data • Ratios are formed for each clone with the reference (normal) intensity in the denominator and the experimental sample in the numerator. • If a segment of DNA containing a clone is not altered, then ideally the ratio for that clone should be 1. • If (in one chromosome) a segment of DNA containing a clone is missing, then ideally the ratio should be 1/2. • If (in one chromosome) a segment of DNA containing a clone is duplicated ideally the ratio should be 3/2. • If the segment is tripled then ideally the ratio should be 2. • Of course data are noisy and subject to bias and artifacts.

  8. Processing Issues • Clone/Array quality issues • Clone mapping issues • Overlaps and inconsistencies • Unequally spaced clones • How to infer behavior at locations between clones • Tiling Paths • Clone-to-clone variation • Differing clone hybridization affinities and clone/dye interaction effects, etc… • Normalization • Removing dye-bias, etc… • Within array normalization • Between array normalization Nature of clone coverage. Inconsistent spacing due to both technical considerations as well as biological reality.

  9. First Step: Develop a parameterized protocol for single slide calls. • Make calls per clone • Use normal/normal distribution • Make calls for each nucleotide covered by at least 1 clone • How to deal with overlapping clones. • How to deal with replicate (and potentially inconsistent) clones. • Extend the calls to regions with no coverage. • Develop method for extension from neighboring clones. • Determine how to divide regions flanked by inconsistent clones. • Standardize genome spacing for analysis. • Merging continuous genome into discrete regions. • How to deal with overlapping regions

  10. Making clone-wise calls from raw data • Absolute threshold cutoffs.

  11. Using Normal Controls • Using normal samples as controls. • A distribution of sample normals analogous to the test channel of interest hybridized to an identical reference channel as used for the experimental hybridizations • Possible cutoff parameters using normal samples • Percentiles • Standard deviations • Z-scores • User specified • Given a fixed scheme (above), how can we find an “optimal” parameter setting?

  12. Extending calls to regions with no coverage Note: We don’t extend over all length only small spans. We cutout regions longer than a specified length.

  13. Standardizing Genome Spacing

  14. Analysis • In an ideal situation we would believe every aberration call. • We would then ask the question: which aberrations occur concordantly across samples? • This is where the STAC statistic helps us out.

  15. Finding a reasonable cutoff • For cutoff SD=1, we are definitely picking up false signal. • For cutoff SD=6 we are likely missing true signal. • Looking one slide at a time it is hard to tell what is a reasonable cutoff. A single array with calls made at 11 different cutoff values.

  16. 6 normals, 15 tumor samples, in parallelfor 11 values of the SD cutoff 3.0 3.5 4.0 4.5 5.0 5.5 6.0 1.0 1.5 2.0 2.5

  17. High Cutoff Middle Cutoff Low Cutoff

  18. Methodology • Avoid making decision on cutoffs. • Calculate significance, at a range of cutoff values, using STAC at each cutoff. • Combine results using multiple testing correction. End Point Percent Aberration Start Point Less Conservative More Conservative SD Cutoff Values

  19. Results • Chromosome 8 important in breast cancer. • Provides fine resolution of aberration. • Rather than simply providing gross changes. • Able to characterize aberration at the resolution of the array. • Able to characterize important regions. • Myc, FGFR, etc. • Other regions previously uncharacterized.

  20. ChARM: Chromosome 8

  21. CBS: Chromosome 8

  22. MSA: Chromosome 8 • Able to characterize a 1Mb amplification of the FGFR oncogene • All single slide methods missed this. • Able to picks up the Myc oncogene amplification • Single-slide methods missed despite its presence in every sample. • Also characterizes other regions. • Some of these regions the single slide methods were able to detect • Detected other smaller regions of aberration • Allows finer resolution mapping • Smaller regions are either missed or clumped together or into larger regions of aberration. FGFR MYC Note: We are working on adding the CBS algorithm implementation to MSA to allow the use of its single slide approach to our Multiple Sample Approach

  23. Discussion • To our knowledge, there are no methods that combine preprocessing and analysis harnessing the power of multiple samples. • Because most methods are single array methods, integration between experiments is difficult to define. • MSA provides statistical analysis at higher resolution. • MSA works with “difficult” data: Based on Pinkel and Albertson scale of difficulty, our method has been tested, and works well, with 5/6 criteria.

  24. Future Plans • Handle Affymetrix SNP Chip data. • Many of the ideas for leveraging multiple samples should also apply to the anaylsis of Affy SNP data. • We are currently working on this extension. • Release stand-alone GUI software package (CGH-MSA). • To be released this month. • www.cbil.upenn.edu/MSA • Incorporate Single slide methods. • Extend the STAC algorithm beyond binary data to account for levels of change. • Estimate bias in non-Controlled experiments.

More Related