1 / 30

Algorithms for Smoothing Array CGH data

Algorithms for Smoothing Array CGH data. Kees Jong (VU, CS and Mathematics) Elena Marchiori (VU, Computer Science) Aad van der Vaart (VU, Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC). Tumor Cell. Chromosomes of tumor cell:. CGH Data.  C o p y #.

bert-terry
Download Presentation

Algorithms for Smoothing Array CGH data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms forSmoothing Array CGH data Kees Jong (VU, CS and Mathematics) Elena Marchiori (VU, Computer Science) Aad van der Vaart (VU, Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC)

  2. Tumor Cell Chromosomes of tumor cell:

  3. CGH Data  C o p y # Clones/Chromosomes 

  4. Naïve Smoothing

  5. “Discrete” Smoothing Copy numbers are integers

  6. Why Smoothing ? • Noise reduction • Detection of Loss, Normal, Gain, Amplification • Breakpoint analysis • Recurrent (over tumors) aberrations may indicate: • an oncogene or • a tumor suppressor gene

  7. Is Smoothing Easy? Measurements are relative to a reference sample Printing, labeling and hybridization may be uneven Tumor sample is inhomogeneous • vertical scale is relative • do expect only few levels

  8. Smoothing: example

  9. Problem Formalization A smoothing can be described by • a number of breakpoints • corresponding levels A fitness function scores each smoothing according to fitness to the data An algorithm finds the smoothing with the highest fitness score.

  10. Smoothing breakpoints variance levels

  11. Fitness Function We assume that data are a realization of a Gaussian noise process and use the maximum likelihood criterion adjusted with a penalization term for taking into account model complexity We could use better models given insight in tumor pathogenesis

  12. Fitness Function (2) CGH values: x1 , ... , xn breakpoints: 0 < y1< … < yN < xN levels: m1, . . ., mN error variances: s12, . . ., sN2 likelihood:

  13. Fitness Function (3) Maximum likelihood estimators of μ and s2 can be found explicitly Need to add a penalty to log likelihood to control number N of breakpoints penalty

  14. Algorithms Maximizing Fitness is computationally hard Use genetic algorithm + local search to find approximation to the optimum

  15. Algorithms: Local Search choose N breakpoints at random while (improvement) - randomly select a breakpoint - move the breakpoint one position to left or to the right

  16. Genetic Algorithm Given a “population” of candidate smoothings create a new smoothing by - select two “parents” at random from population - generate “offspring” by combining parents (e.g. “uniform crossover” or “union”) - apply mutation to each offspring - apply local search to each offspring - replace the two worst individuals with the offspring

  17. Experiments • Comparison of • GLS • GLSo • Multi Start Local Search (mLS) • Multi Start Simulated Annealing (mSA) • GLS is significantly better than the other algorithms.

  18. Comparison to Expert algorithm expert

  19. Relating to Gene Expression

  20. Relating to Gene Expression

  21. Algorithms forSmoothing Array CGH data Kees Jong (VU, CS and Mathematics) Elena Marchiori (VU, CS) Aad van der Vaart (VU, Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC)

  22. Conclusion • Breakpoint identification as model fitting to search for most-likely-fit model given the data • Genetic algorithms + local search perform well • Results comparable to those produced by hand by the local expert • Future work: • Analyse the relationship between Chromosomal aberrations and Gene Expression

  23. Example of a-CGH Tumor  V a l u e Clones/Chromosomes 

  24. a-CGH DNA In Nucleus Same for every cell DNA on slide Measure Copy Number Variation Expression RNA In Cytoplasm Different per cell cDNA on slide Measure Gene Expression a-CGH vs. Expression

  25. Breakpoint Detection • Identify possibly damaged genes: • These genes will not be expressed anymore • Identify recurrent breakpoint locations: • Indicates fragile pieces of the chromosome • Accuracy is important: • Important genes may be located in a region with (recurrent) breakpoints

  26. Experiments • Both GAs are Robust: • Over different randomly initialized runs breakpoints are (mostly) placed on the same location • Both GAs Converge: • The “individuals” in the pool are very similar • Final result looks very much like (mean error = 0.0513) smoothing conducted by the local expert

  27. Genetic Algorithm 1 (GLS) initialize population of candidate solutions randomly while (termination criterion not satisfied) - select two parents using roulette wheel - generate offspring using uniform crossover - apply mutation to each offspring - apply local search to each offspring - replace the two worst individuals with the offspring

  28. Genetic Algorithm 2 (GLSo) initialize population of candidate solutions randomly while (termination criterion not satisfied) - select 2 parents using roulette wheel - generate offspring using OR crossover - apply local search to offspring - apply “join” to offspring - replace worst individual with offspring

  29. Fitness function (2) CGH values: x1 , ... , xn breakpoints: 0 < y1< … < yN < xN levels: m1, . . ., mN error variances: s12, . . ., sN2 likelihood:

More Related