1 / 29

Copy Number Analysis in the Cancer Genome Using SNP Arrays

Copy Number Analysis in the Cancer Genome Using SNP Arrays. Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum 02 - 12 - 2007. What is Copy Number ?.

shelley
Download Presentation

Copy Number Analysis in the Cancer Genome Using SNP Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Copy Number Analysis in the Cancer GenomeUsing SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum 02 - 12 - 2007

  2. What is Copy Number ? • Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells... (from Wikipedia www.wikipedia.org) • DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. (from Nature Reviews Genetics, Feuk et al. 2006) • Chromosomal Copy Number • It refers to DNA Copy Number in most publications.

  3. Why Study Copy Number ? “ Chromosomal copy number alterations can lead to activation of oncogenes and inactivation of tumor suppressor genes (TSGs) in human cancers. … identification of cancer-specific copy number alterations will not only provide new insight into understanding the molecular basis of tumorigenesis but will also facilitate the discovery of new TSGs and oncogenes.”

  4. Normal cell CN=2 Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 DNA Copy Number Changes in Tumor Cells

  5. Why Use SNP Arrays ? • CGH Array CGH: Comparative genomic hybridization “Array-based CGH makes it possible to scan the genome for copy number with high resolution by hybridizing to arrayed genomic DNA or cDNA clones. …However, currently available array CGH methods cannot simultaneously detect chromosomal loss of heterozygosity (LOH). “ • SNP Array “… to combine the detection of cancer copy number with cancer-specific LOH in the same experiments, we have developed an analytical method to detect DNA copy number changes by hybridization of representations of genomic DNA to commercially available single nucleotide polymorphism (SNP) arrays.” Simultanously detect DNA copy number changes and phenotype changes (LOH) in tumor cells

  6. Materials & Methods 5 samples for validation, with known copy numbers of chromosome X (1,2,3,4,5 copies of chrom. X) 2 diploid cell lines containing cytogenetically mapped partial or whole-chromosome copy number gains or losses. 18 lung and breast cancer cell lines 15 normal blood control cell lines Affymetrix XbaI mapping array 130 (10,043 SNPs) Chip scanning and image processing by MAS 5.0 Intensity normalization and summarization Raw/observed copy numbers of cancer samples Segmentation and copy number estimation (Hidden Markov Model, HMM)

  7. Normalization & Summarization • Normalization (reducing technical variation between chips, making intensities from different chips comparable) - Base Line Array Method • Summarization (combining the multiple probe intensities for each SNP to produce a summarized signal value for each SNP) Perfect Match: pm = pmA + pmB Mismach: mm = mmA + mmB Model based summarization pm/mm difference multiplicative model (Li & Wong , 2001)

  8. For each SNP of each cancer sample observed signal Observed CN = x 2 mean signal of two copy normal samples Log2 Transformed Intensities and Raw CNs Black: Normal, Red: Tumor, Green: Tumor/Normal Observed/Raw Copy Number Data

  9. CN=4 CN=3 CN=2 CN=1 Segmentation & Estimation

  10. … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … CN=? CN=? CN=? CN=? CN=? Obs. CN Obs. CN Obs. CN Obs. CN Obs. CN CN Estimation: Hidden Markov Model (HMM)CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp) SNP Hidden status (unknown CN ) Observed status (observed/raw CN) CN estimation:finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=0,1,2,3,… n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=0,1,2,3,… n; y=0,1,2,3, … n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P( observed CN | CN=y) y=0,1,2,3, …n

  11. Prior Information for HMM • Background Probabilities (B) • Overall probabilities of possible CN values. • P(CN=2)=0.9 • P(CN=i)=0.1/(N-1), i=0,1,3,4,…,N; N=max CN allowed. e.g. P(CN=i)=0.01 when N=11 • Transition Probabilities (T) • Probabilities of CN values of each SNP conditional on the previous one. • P(CN_i+1=x|CN_i=y); x=-0,1,2,3,… n; y=0,1,2,3, … n • Genetic distance (Haldane map funtion) • Emission Probabilities (E) • Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. • Signal |CN ~ t distribution with df=40 • Max Liklihood (Observed CN | B, T, E); Interative 0 1 2 3 … n 0 p00 p01 p02 p03 … p0n 1 p10 p11 p12 p13 … p1n 2 p20 p21 p22 p23 … p2n 3 p30 p31 p32 p33 … p0n … n pn0 pn1 pn2 pn3 … pnn

  12. HMM CN estimation for the samples with known CN of Chr. X

  13. Errors of HMM (1-99.2%=0.8%) “… our criteria for homozygous deletion require the presence of at least 2 SNPs that cover an area of 1 kb in addition to an inferred copy number of 0 …”

  14. HMM CN estimation for the samples with known loss or gain regions

  15. HMM CN estimation for cancer cell lines

  16. Contamination Problem

  17. Disadvantages of HMM • With no significance test • Intense computation • Individual level analysis

  18. Software Affymetrix Chips (www.affymetrix.com) Illumina Chips (www.illumina.com) CNAT(www.affymetrix.com) dChip (www.dchip.org) CNAG (www.genome.umin.jp) GenePattern www.broad.mit.edu/cancer/software/genepattern/ BioConductor R Packages (www.bioconductor.org) GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method

  19. References • JL Freeman et al. Genome Research 2006; 16:949-961 • J Huang et al. Hum Genomics. 2004;1(4):287-99 • X Zhao et al. Cancer Research 2004; 64:3060-3071 • Y Nannya et al. Cancer Research 2005, 65: 6071-6079 • … see google …

  20. Genome-wide Raw CN Changes (Piar#105)

  21. Genome-wide Raw CN Changes(average over ~400 pairs )

  22. Raw CN Changes of Chr. 14(average over ~400 pairs )

  23. … .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … .. Window k Window N Window 10 Window 9 Window 6 Window 8 Window 4 Window 3 Window 2 Window 1 Window 7 Window 5 ……….. ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29) Sliding Window Analysis

  24. Genome-wide Raw Copy Number Changes(sliding window plot, averaged over ~400 pairs )

  25. Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs

  26. CN Change Frequencies in Population( Chr.14,~400 pairs)Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)

  27. Microarray: From Image to Copy Number Tumor Normal Affymetrix Mapping 250K Sty-I chip ~250K probe sets ~250K SNPs probe set (24 probes) CN=2 CN=2 CN=2 Deletion CN=1 CN=0 CN>2 Deletion Amplification more DNA copy number more DNA hybridization higher intensity

  28. Finished chips (scanner) Raw image data [.DAT files] (experiment info [ .EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis chip description file [.CDF] Preprocessing : • General Procedures for Copy Number Analysis

More Related