1 / 29

Modelling of CGH arrays experiments

Philippe Bro ë t Faculté de Médecine, Université de Paris-XI. Sylvia Richardson Imperial College London. Modelling of CGH arrays experiments. CGH = C ompetitive G enomic H ybridization. Outline. Background Mixture model with spatial allocations Performance, comparison with CGH-Miner

kenton
Download Presentation

Modelling of CGH arrays experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Philippe Broët Faculté de Médecine, Université de Paris-XI Sylvia Richardson Imperial College London Modelling of CGH arrays experiments CGH = CompetitiveGenomic Hybridization

  2. Outline • Background • Mixture model with spatial allocations • Performance, comparison with CGH-Miner • Analyses of CGH-array cancer data sets • Extensions

  3. Aim: study genomic alterations in oncology Loss Gain Tumor supressor gene Oncogene The development of solid tumors is associated with the acquisition of complex genetic alterations that modify normal cell growth and survival. Many of these changes involve gains and/or losses of parts of the genome: Amplification of an oncogene or deletion of a tumor suppressor gene are considered as important mechanisms for tumorigenesis.

  4. Case Control • CGH = Competitive Genomic hybridization • Array containing short sequences of DNA bound to glass slide • Fluorescein-labeled normal and pathologic samples co-hybridised to the array • 1. Extraction • - DNA • 2. Labelling (fluo) • 3. Co-hybridization • 4. Scanning

  5. Once hybridization has been performed, the signal intensities of the fluorophores is quantified Provides a means to quantitatively measure DNA copy-number alterations and to map them directly onto genomic sequence

  6. MCF7 cell line investigated in Pollack et al (2002) 23 chromosomes and 6691 cDNA sequences Data log transformed: Difference bet. MCF7 and reference

  7. Types of alterations observed • (Single) Gain or Deletion of sequences, occurring for contiguous regions Low level changes in the ratio ± log2 but attenuation (dye bias)  ratio ≈± 0.4 • Multiple gains (small regions) High level change, easy to pick up Focus the modelling on the first common type of alterations

  8. Chromosome 1 Multiple gains ? Deletion? Normal?

  9. 2 -- Mixture model

  10. Specificity of CGH array experiment • A priori biological knowledge from conventional CGH : • Limited number of states for a genomic sequence : • - presence (modal), - deletion, - gain(s) • corresponding to different intensity ratios on the array • Mixture model to capture the underlying discrete states • GS located contiguously on chromosomes are likely to carry alterations of the same type • Use clone spatial location in the allocation model 3 component mixture model with spatial allocation

  11. Mixture model For chromosome k: Zgk : log ratio of measurement of normal versus tumoral change, genomic sequence (GS) g, chromosome k Dye bias is estimated by using a reference array (normal/normal) and then subtracting the bias from Zgk Zgk  w1gkN(μ1 ,12) + w2gkN(μ2 ,22) + w3gkN(μ3 ,32) 3=gain 2=presence 1=deletion For unique labelling: μ1 < 0 , μ3 > 0 μ2 = 0 (dye bias has been adjusted)

  12. x x x g -1g g+1 Spatial neighbours of GS g Mixture model with spatial allocation • Zgk  w1gkN(μ1 ,12) + w2gkN(μ2 ,22) + w3gkN(μ3 ,32) • Spatial structure on the weights (c.f. Fernandez and Green, 2002): • Introduce 3 centred Markov random fields {umgk}, m = 1, 2, 3 with nearest neighbours along the chromosomes • Define mixture proportions to depend on the chromosomic location via a logistic model: • wcgk = exp(ucgk) / Σm exp(umgk) • favours allocation of nearby GS to same component

  13. Prior structure • wcgk = exp(ucgk) / Σm exp(umgk) with Gaussian Conditional AutoRegressive model : ucgk | uc-gk ~ N (h uc hk /ng , sck2/ng) for h = neighbour of g (ng = #h, one or two in this simple case),with constraint g uc gk = 0 • Variance parameters sck2 of the CAR acts as a smoothing prior:  indexed by the chromosome : ‘switching structure’ between the states can be different between chromosomes • Mean and variances (μc,c2)of the mixture components are common to all chromosomes  borrowing information • Inverse gamma priors for the variances, uniform priors for the means

  14. Posterior quantities of interest • Bayesian inference via MCMC, implemented using Winbugs • In particular, latent allocations, Lgk , of GS g on chromosome k to state c, are sampled during the MCMC run • Compute posterior allocation probabilities : pcgk= P(Lgk = c | data), c =1,2,3 • Probabilistic classification of each GS using threshold on pcgk : -- Assign g to modified state: deletion (c=1) or gain (c=3) if corresponding pcgk > 0.8, -- Otherwise allocate to modal state. Subset S of genomic sequences classified as modified (this subset depends on the chosen threshold)

  15. False Discovery Rate • Using the posterior allocation probabilities, can compute an estimate of FDR for the list S : • Bayes FDR (S) | data = 1/card(S) Σg  S p2gk where p2gk is posterior probability of allocation to the modal (c=2) state Note: Can adjust the threshold to get a desired FDR and vice versa

  16. 3 -- Performance

  17. Gain Mod Gain Modal Deletion Modal Modal 30 20 10 Simulation set-up • 200 fake GS with Z ~ N(0,.32) , modal Z ~ N(log 2,.32) , deletion, a block of 30 GS Z ~ N(- log 2,.32), gains, blocks of 20 and 10 GS • Reference array with Z ~ N(0,.32) • 50 replications

  18. CGH-Miner • Data mining approach to select gain and losses (Wang et al 2005): • Hierarchical clustering with a spatial constraint (ie only spatially adjacent clusters are joined) • Subtree selection according to predefined rules  focus on selecting large consistent gain/loss regions and small (big spike) regions • Implemented in CGH-Miner Excel plug in • Estimation of FDR using a reference (normal/normal) array and the same set of rules to prune the tree. Declared target 1% • Simulation set-up is similar to Wang et al.

  19. Classification obtained by CGH miner and CGH mix Gain Mod Gain Modal Deletion Modal Modal 30 20 10

  20. Posterior probabilities of allocation to the 3 components

  21. Comparative performance between CGHmix and CGH-Miner

  22. 4 -- Analyses of CGH-array cancer data sets

  23. ^ ^ ^ ^ ^ Breast cancer cell line MCF7 • Data from Pollack et al., 6691 GS on 23 chromosomes • μ1 = -0.35, 1 = 0.37 • (μ2 = 0) 2 = 0.27 • μ3 = 0.44, 3 = 0.54 • Estimated FDR CGHmix = 2.6% • Estimated FDR CGH-Miner = 1.5%

  24. Classification of GS obtained by CGHmix

  25. known alterations found by both methods additional known Alterations found by CGHmix

  26. ^ ^ Neuroblastoma KCNR cell lineCurie Institute CGH custom array for chromosome 1 • 190 genomic clones, mostly on the short arm • 3 replicate spots for each • μ1 = - 0.49, loss component • μ3 = 0.04, not plausible  no gain in this case • Estimate FDR by regrouping c=2 and c=3 classes • Substantial number of deletions on short arm • No deletion found for the long arm by CGHmix, a result confirmed by classical cytogenetic information

  27. Long arm

  28. Extensions • Account for variability in the case of repeated measurement  add a measurement model with GS specific noise, with exchangeable prior • Refine the spatial model: • Incorporate genomic sequence location in the neighbourhood definition of the CAR model 0-1 contiguity  spatial weights • In particular, account for overlapping sequences by using weights that depend on the overlap

More Related