1 / 56

Some slides adapted from J. Fridlyand

Analysis of Array CGH Data by Hanni Willenbrock. Some slides adapted from J. Fridlyand. BioSys course: DNA Microarray Analysis – Lecture, 2007. Outline. Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection

redell
Download Presentation

Some slides adapted from J. Fridlyand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Array CGH Data by Hanni Willenbrock Some slides adapted from J. Fridlyand BioSys course: DNA Microarray Analysis – Lecture, 2007

  2. Outline • Introduction to comparative genomic hybridization (CGH) and array CGH • Data analysis approaches • Breakpoint detection • Loss and gain analysis • Application of segmentation to testing • Real data example 1: Application to a primary tumor dataset • Real data example 2: comparative genomic profiling of bacterial strains

  3. Comparative Genomic Hybridization • Study types : • Gain or loss of genetic material • To find variations in the genetic material • Purposes: • Study of chromosomal aberrations often found in cancer and developmental abnormalities. • Study of variations in the baseline sequence in a microbial population (microbial comparative genomics). PhD defense, October 27th 2006

  4. A Variety of Genetic Alterations Underlie Developmental Abnormalities and Disease • Inappropriate gene activation or inactivation can be caused by: • Mutation • Epigenetic gene silencing (e.g. addition of methyl groups) • Reciprocal translocation (exchange of fragments between two non-homologous chromosomes) • Gain or loss of genetic material Any of the above may lead to an oncogene activation or to inactivation of a tumor suppressor.

  5. Existing techniques for detecting structural abnormalities Albertson and Pinkel, Human Molecular Genetics, 2003

  6. Some microarray platforms for copy number analysis • BAC arrays • Affymetrix SNP chip (500 K) • Representational oligonucleotide microarray analysis (ROMA) in • Whole genome tiling arrays • Own design (NimbleGen/NimbleExpress)

  7. Array CGH: BAC arrays 12 mm HumArray3.1 2464 human BAC clones spotted in triplicates 164-196 kbp

  8. Cot-1 DNA Ratio Position on Sequence Array CGH Maps DNA Copy Number Alterations to Positions in the Genome Test Genomic DNA Reference Genomic DNA Gain of DNA copies in tumor Loss of DNA copies in tumor

  9. Example:Detection of DiGeorge region (A) Detection of deletion in the DiGeorge region by FISH. A chromosome 22 subtelomere probe (green) and the TUPLE1 probe for the DiGeorge region (red) were hybridized to metaphase chromosomes from a normal individual and an individual with the deletion. The arrow indicates the missing red FISH signal on the deleted chromosome. (B) Array CGH copy number profile of chromosome 22 showing deletion in the DiGeorge region (arrow). Albertson and Pinkel, Human Molecular Genetics, 2003

  10. Structural abnormalities * *HSR: homogeneously staining region Albertson and Pinkel, Human Molecular Genetics, 2003

  11. Tumor Genomes are StableCopy Number Profiles of a Tumor & Recurrence

  12. Analysis of array CGH Goal:To partition the clones into sets with the same copy number and to characterize the genomic segments in terms of copy number. Biological model: genomic rearrangements lead to gains or losses of sizable contiguous parts of the genome, possibly spanning entire chromosomes, or, alternatively, to focal high-level amplifications.

  13. Breakpoints Varying genomic complexity

  14. Exercise Part I:Plot and view array CGH data DNA Microarray Analysis Course, 2007

  15. Observed clone value and spatial coherence N(-.3, .08^2) N(.6, .1^2) ? ? Useful to make use of the physical dependence of the nearby clones, which translates into copy number dependence. DNA Microarray Analysis Course, 2006

  16. Expected log2 ratio as a function of copy number change, normal cell contamination and ploidy Reference ploidy=3 Reference ploidy=2 100% 2.58 2.0 50% 0.58 0.42 10% 0.0 0.58 0.38 0.07

  17. Simulation Study • Many algorithms to choose from • Mainly evaluated only on limited examples • Few comparisons between algorithm performance • Choice of evaluation criteria: • False breakpoint detection vs. missed breakpoints • Sample type preferences (size of segments, noise, etc)

  18. Methods for Segmentation • HMM: Hidden Markov Model (aCGH package) • Fit HMMs in which any state is reachable from any other state(Fridlyand et al, JMVA, 2004). • CBS: Circular binary segmentation (DNAcopy package) • Tertiary splits of the chromosomes into contiguous regions of equal copy number and assesses significance of the proposed splits by using a permutation reference distribution(Olshen et al, Biostatistics, 2004). • GLAD: Gain and Loss Analysis of DNA (GLAD package) • Detects chromosomal breakpoints by estimating a piecewise constant function that is based on adaptive weights smoothing (Hupe et al, Bioinformatics, 2004).

  19. One segment Comparison Scheme • Use of simulated data, where the truth is known • The noise is controlled (see later slide) True breakpoint false predicted breakpoint

  20. Breakpoint Detection Accuracy

  21. Exercise Part II:Segmentation and breakpoint prediction DNA Microarray Analysis Course, 2007

  22. Merging segments Note: that all procedures operate on individual chromosomes, therefore resulting in a large number of segments with mean values close to each other. Additional Challenge: reduce number of segments by merging the ones that are likely to correspond to the same copy number. This will facilitate inference of altered regions. DNA Microarray Analysis Course, 2006

  23. Merging • For estimating actual copy number levels from segmentations DNA Microarray Analysis Course, 2006

  24. Segmentation and Merging DNA Microarray Analysis Course, 2006

  25. ROC Curve: Identification of copy number alterations for varying thresholds

  26. Exercise Part III:Estimate copy number gain and losses DNA Microarray Analysis Course, 2007

  27. Using segmentation for testing(phenotype association studies) Example case: Find clones (or whole segments) that are significantly differing in copy number between two cancer subtypes. Task: Investigate whether incorporating spatial information (segmentation) into testing for differential copy number increases detection power. Data type: Samples with either of 2 different phenotypes (e.g. 2 different cancer subtypes) How: Comparison of sensitivity and specificity using: • Original test statistic (no use of spatial information) • Segmented T-statistic derived from original log2 ratios • T-statistic computed from segmented log2 ratios

  28. Simulation of Array CGH Data Real biological variation considered: • Breast cancer data used as model data Segment length and copy number is taken from the empirical distribution observed in breast cancer data (DNAcopy segmentation). • Mixture of cells (sample is not pure) Each sample was assigned a value, Pt: proportion of tumor cells, between 0.3 and 0.7 from a uniform distribution. • Experimental noise is Gaussian Standard deviations drawn from a uniform distribution between 0.1 and 0.2 to imitate real data where the noise may vary between experiments. • Cancer subtypes are heterogeneous Certain aberrations characteristic for a cancer subtype may only exist in a percentage of the patients with that cancer subtype. Thus, in each sample, segments with copy number alterations (copy number not 2) was removed at random with probability 30%.

  29. Testing samples (original values) 37.5% 20 samples from either of 2 classes, red is true copy number, black dots are simulated values, circles around example of heterogeneity x9 57.0% x11

  30. Testing samples (original values) Red: True different clones

  31. Testing: why is multiple testing necessary? standard p-value cutoff for alpha=0.05 => Many false positives

  32. Testing: why is multiple testing necessary? • Significance with random class assignments? • By chance, many test statistics are below/above standard significance thresholds 2.93 5.29 -3.99 (maximum deviating value)

  33. The maxT Multiple Testing Correction By repeating random class assigningment and testing, e.g. 100 times, the following ”permutation reference distribution” of maximum absolute test statistic is obtained (maxT distribution): We wish to control the family wise error rate (FWER) at alpha=0.05 (5% chance of 1 false positive). Therefore, the cut-off should be such that only in 5% of the random cases, we will get one false positive (95 percentile): cutoff = 5 standard significance threshold MaxT multiple testing corrected threshold

  34. Testing samples (original values) maxT p-value cutoff for alpha = 0.05 standard p-value cutoff for alpha=0.05

  35. Reference Testing: Segmenting test statistics

  36. Testing segmented samples ............ ............ 1. Segmentation of individual samples...

  37. Reference Testing segmented samples 2. T-statistic from segmented individual samples...

  38. Detecting regions with differential copy number Willenbrock and Fridlyand. Bioinformatics2005; 21(22): 4084-91

  39. Variation of Simulation Parameters • Signal2noise • CBS consistently the best performance • HMM has the highest FDR • GLAD is least sensitive • Alternative empirical distributions of segment lengths • HMM has highest sensitivity for segment sizes below 10 • CBS has highest sensitivity for segment sizes 10 or larger • GLAD consistently performes the worst • Outlier detection

  40. Real Data Example 1: Primary Tumor Data • 75 oral squamous cell carcinomas (SCCs) • TP53 mutational status of all samples was determined using sequence information (Snijders et al., 2005) • Tasks: • Characterize wild-type and mutant samples with respect to their genomic alterations • Build a classifier to predict TP53 mutational status

  41. Frequency of Gain/Loss Comparisons Threshold-based Merge-based 5% altered 33% altered

  42. Why such a difference in alteration frequency? + 2.5x MAD - 2.5x MAD Willenbrock and Fridlyand. Bioinformatics2005; 21(22): 4084-91 • High threshold-based cut-off is due to the high experimental noise of the paraffin-embedded tumors

  43. Classification results Willenbrock and Fridlyand. Bioinformatics2005; 21(22): 4084-91

  44. Real Data Example 2:Comparative genomic profiling of several Escherichia coli strains • The microarray design included probes for: • 7 known E. coli strains • 39 known E. coli bacteriophages • 104 known E. coli virulence genes • Experimentally: • 2 sequenced control strains (W3110 and EDL933), 3 replicates • 2 non-sequenced strains (D1 and 3538), 3 replicates • Bacteriophage: 3538 (stx2::cat), 2 replicates

  45. Comparative Genomic Profiling: challenges • Ratio problems: some genes might be present on query strain but not on the known reference strain. • Single channel microarrays or dual channel microarrays? • In this case, we used an Affymetrix single channel custom made array (NimbleExpress) • Partly present genes versus similar but different genes.

  46. Homology between the 7 E. coli strains included on the microarray • Very high similarity between the two K-12 strains and between the two O157:H7 strains. • Percentage of homologues for E. coli genomes in columns found in E. coli genomes in rows. Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

  47. BLAST Atlas Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

  48. Hybridization Atlases • Probe hybridizations for experiments (samples) result in a similar pattern as expected from the BLAST atlas. Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

  49. Mapping the phage Φ3538 (stx2::cat) Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

  50. Zoom of phage Φ3538 (stx2::cat) • The hybridization pattern is very similar for the phage, strain 3538 and strain D1. Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

More Related