1 / 28

Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1. GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data. RECOMB 2007 Presentation.

fathi
Download Presentation

Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yanxin Shi1, Fan Guo1, Wei Wu2,Eric P. Xing1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation 1 School of Computer Science, Carnegie Mellon University2 Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh

  2. Outline • Motivation and Background • Computational framework • Experiments and Results • Summary

  3. Copy number aberration and Array CGH • DNA copy number (a.k.a. dosage state) • Normal: 2 DNA copies • Aberrations: deletion(0 copy), loss (1 copy), gain(3 copies), amplification(>3 copies) • Array CGH: a high throughput method to measure DNA copy number

  4. Array CGH data • Ideally, • Deletion (0 copy): LR = log(0/2) = • Loss (1 copy): LR = log(1/2) = -1 • Normal (2 copies): LR = log(2/2) = 0 • Gain (3 copies): LR = log(3/2) = 0.58 • Amplification (>=4 copies): LR >= log(4/2) = 1

  5. However… • Factors influencing the LR values • Impurity of the test sample(e.g.mixture of normal and cancer cells) • Variations of hybridization efficiency • Base compositions of different probes • Saturation of array • Divergent sequence lengths of the clones • Many others… • Measurement noises, etc…

  6. Segmental pattern and spatial drift Spatial drift Segmental pattern

  7. Existing Computational Methods • Threshold Method • Mixture Models (e.g. Hodgson et al., 2001) • Assume observations are iid samples from a mixture distribution. • Regression Models (e.g., Hsu et al., 2005; Myers et al., 2004) • Smoothing for visual inspection to detect copy number states. • Segmentation Models (e.g. Hupé et al., 2004) • Directly search for breakpoints in sequential data; • Spatial Dynamics Models (e.g. Fridlyand et al., 2004)

  8. Spatial Dynamic Methods • Hidden Markov Models • Dosage states form a Markov chain of hidden variables • Observed LR ratios are generated from state-specific Gaussian distributions dosage states LR ratios

  9. Dosage-Specific Kalman Filters • Introduce hidden trajectory to model state-specific LR distributions (no longer fixed mean) Linear Dynamics for dosage state m

  10. Switching Kalman Filters Trajectory 1 Trajectory M Dosage state chain • A SKF generates observations from one of the trajectories.

  11. Posterior Inference • Dosage annotation is equivalent to the estimate of the posterior . • Recovering the hidden trajectory: .

  12. Variational Inference • Posterior Inference is intractable. • Variational inference: decouple the hidden chains. • Decoupled chains have tractable distributions.

  13. Variational Inference • Use this tractable distribution to approximate the true distribution by minimizing KL divergence. • Fixed point equations to update the variational parameters.

  14. Parameter Sharing • The CGH dataset contains whole-genome measurements for multiple individuals. • Chromosome-specific parameters shared across individuals: • Individual-specific parameters shared across chromosomes: trajectory parameters: All other parameters e.g. output noise variance

  15. Experiment Design • Simulation Analysis: • Data generated from SKFs. • Compare with: threshold, HMM. • aCGH profiles of 125 colorectal tumors (Nakao et al. 2004) • Case studies of 3 representative chromosomes. • Populational analysis over 125 genomes

  16. Simulation Analysis (1) Performance of dosage state prediction (b – noise in hidden dynamics, r – noise in observation, M=5)

  17. Simulation Analysis (2) Prediction by HMM Synthetic Data Prediction by SKF

  18. Experiment Design • Simulation Analysis: • Data generated from SKFs. • Compare with: threshold, HMM. • aCGH profiles of 125 colorectal tumors (Nakao et al. 2004) • Case studies of 3 representative chromosomes. • Populational analysis over 125 genomes

  19. Real aCGH Profile Spatial Patterns Difficult for Conventional Methods(1) Flat-Arch Pattern

  20. Real aCGH Profile Spatial Patterns Difficult for Conventional Methods(2) Step Pattern

  21. Real aCGH Profile Spatial Patterns Difficult for Conventional Methods(3) Spikes Pattern

  22. Populational Analysis Frequency of dosage state alteration of 125 individuals red bar – copy number gain or amplification blue bar – copy number loss or deletionsolid vertical lines – boundary between chromosomes

  23. Populational Analysis Frequency of dosage state alteration on 2 chromosomes top, red square – copy number gain top, blue circle – copy number loss bottom, red square – copy number amplification bottom, blue circle – copy number deletion

  24. Summary • SKF for whole-genome analysis of aCGH data. • SKF can capture variations in the hybridization efficiency. • Parameter sharing scheme for data integration. • Possible Extensions: • Gene expression concordance analysis • Incorporate information about sequence length and distance between clones

  25. Thank you!

  26. Populational Analysis Detailed spectrum of GIM rates over 125 Colorectal cancer patients in 4 hotspots region with annotation of cancer related gene

  27. M is selected by AIC. • We also have done experiments to compare SKF with segmentation methods (result now shown here).

  28. Switching Kalman Filters • A SKF generates observations from one of the trajectories. • is the switching process as in an HMM. • are observed LR ratios.

More Related