310 likes | 452 Views
“alexes of all nations unite!” Epicenter Analysis in Cancer. Alex Krasnitz, CSHL Search and knowledge building for biological datasets, UCLA, 11.26-30, 2007 Input: segmented data from (ROMA) CGH. A predictive signal: a whole-genome biomarker for survival. Pinning algorithm.
E N D
“alexes of all nations unite!”Epicenter Analysis in Cancer Alex Krasnitz, CSHL Search and knowledge building for biological datasets, UCLA, 11.26-30, 2007 Input: segmented data from (ROMA) CGH. A predictive signal: a whole-genome biomarker for survival. Pinning algorithm. Pins find cancer genes. Pins predict tissue of origin. Pins and progression.
(ROMA) CGH in vitro and in silico:A method for measuring relative copy numbers of short fragments in a genome. A multistep process consisting of • Digestion a restriction enzyme (BglII) • PCR → short (0.2-1.2kb) fragments are selected • Hybridization to an oligonucleotide (50mer probes) microarray (85K probe format used in present study, higher resolution work in progress) • Gridding • Normalization • Segmentation • Thresholding • CNP masking • Horizontal slicing
Raw and segmented ROMA profile; FISH validation of copy number variations detected by ROMA.Segmentation algorithm (B. Lakshmi, M. Wigler): replace a raw profile by a piecewise-constant function minimizing variance.
Cancer-free Female x Cancer-free Reference MaleCNPs and SNPs are genetic markers Heterozygous CNPsCopy Number Polymorphisms ‘ROMA’ SNPs Homozygous
Typical tumor genomes are NOT normal. Still, they may contain CNPs that must be filtered out.
Genomic rearrangements in cancer(Bayani et al, Seminars in Cancer Biology 17, 5, 2007)
CNP masking: determine positions of frequent CNPs from a set of cancer-free genomes (~500 cases); excise these from cancer profiles in a minimally intrusive fashion.
1 2 3 4 5 Event identification: horizontal slicing • Allow multiple events at a locus in a profile. • Select vertically non-overlapping segments of maximal total length. These define tiers. • Assign remaining segments each to the closest tier.
Breast cancer study • 257 frozen tissue samples of Scandinavian (140 Swedish, 117 Norwegian) origin. • Accompanied by clinical documentation. *progesterone (PR) and estrogen (ER) receptors measured by ligand binding; pos=>0.5fg/mg protein + ERBB2 amplification scored by ROMA as segmented ratio greater than 0.1 above baseline.
A heuristic classification of breast cancer profiles: simplex, sawtooth and firestorm Small # of events overall & per chromosome Multiple events, no clustering Multiple clustered events
Fisher’s exact test: strong association with survival, no association with any clinical parameter except age at diagnosis. Initial observation: firestorms lead to poor survival. Quantify presence of firestorms by (sum over inverse average lengths of adjacent segments). Is F a predictor of survival, and if so, is it independent of clinical parameters?
KM plots for the Swedish diploid subset (no significant change when adjusted for age at diagnosis)
Search for epicenters • Key assumption: observed amplifications and deletions are more likely than not to confer a selective advantage upon a neoplastic cell. • If so, expect frequently amplified regions of the genome to be enriched in oncogenes. • Require methods for detecting such regions. • Frequency plot inadequate.
Potential Benefits • Massive data reduction (O(105)probes to ~100 epicenters); a manageable set of predictors • Disentanglement • Target selection for functional studies (cancer gene finding)
Pinning • Consider a smallest unit of the genome containing all its events (a chromosome). • For a given N, find N positions within that unit that best explain the observed set of (amplification or deletion) events, i.e., N positions that are shared by the highest number k(N) of events. • Multiple solutions occur, either due to a “fuzzy pin” or due to N being too low. • Increment N until the increment I(N)=k(N)-k(N-1) reaches a pre-set minimal value. Note that I(N) is a non-increasing function of N. • Pinning is convergent: it is guaranteed to recover the epicenters given enough data.
Greedy pinning is not optimal Greedy, N=2 (5 out of 6) Non-greedy, N=2 (6 out of 6) • Required: exhaustive enumeration of all possible N-pin configurations. • Pin positions: a fixed grid or determined by break points in the data. • In present data set: up to 5 pins per chromosome, O(100) pin positions.
Test of significance • For the optimal N-pin solutions determine the event score k(N), and the gain IN=k(N)-k(N-1). • Perform multiple whole-genome shuffles of the events, including those of the opposite sign. For each shuffle find its IN. Estimate a p-value by comparison to the true IN.
Interpretation of results: consider only the top-scoring pin configurations. Then, for pin #i in a top-scoring configuration, compute, at coordinate x (the sum is over the inverse lengths the events pinned by #i and containing x) Example: 17q, 5 pins
Lung cancer deletions: known tumor suppressors and novel elements (213 cases, courtesy S. Powers)
Estimates of utility • Goal: select the most promising 10% of the genome to focus functional studies on. • Is pinning useful in this sense? • A test: how enriched is the top-scoring 10% quantile in known genetic elements implicated in breast cancer? • We hit major known oncogenes, so can expect good results. More formally, perform a database search (top 10%, 17q).
Gene EnrichmentEpicenters are enriched in (CCDS) genes compared to the genome and to the copy number events because (a) epicenters bracket genes and (b) genes are clustered.
Application: predicting tissue of originRandom forest classifier using joint sets of epicenters as predictors
Application: early events in breast cancer Compute frequency weighted by inverse number of events for contiguous groups of epicenters. Outliers: FISH-validated early 16p-1q translocation.
Summary • Pinning is a method for finding copy number variation epicenters in (cancer) genomes. • Applied to: a set of 257 FISH-validated breast cancer genome profiles; lung and colon cancer sets. • The epicenters found by pinning are significantly enriched in genes. • Epicenters find tissue of origin. • Epicenters detect early lesions.
ROMA-based Cancer Biology at CSHLMike Wigler, Jim Hicks, Rob Lucito, Scott Powers, David Mu FISH Primer Selection Program & ProbesNicholas Navin ROMA Michael RiggsDiane EspositoJoan AlexanderJen TrogeEvan Leibu Bioinformatics Lakshmi Muthuswamy Boris YamromAKVlad Grubor Yoon-Ha Lee Tony LeottaJude Kendall Deepa Pai Andy Reiner John Healy FISH (Karolinska)Susanne ManerPar Lundin StatisticsXiaoyue Zhao Chris Yoon FACS/Database Linda Rodgers Collaborators:Anders Zetterberg –Karolinska Inst.Anne-Lise Borressen-Dale – Norway Radium Hosp. Kenny Ye – Albert Einstein Sch. Med.Thea Tlsty – UCSFLarry Norton - MSKCC