290 likes | 405 Views
Copy Number Analysis in the Cancer Genome Using SNP Arrays. Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum 02 - 12 - 2007. What is Copy Number ?.
E N D
Copy Number Analysis in the Cancer GenomeUsing SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum 02 - 12 - 2007
What is Copy Number ? • Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells... (from Wikipedia www.wikipedia.org) • DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. (from Nature Reviews Genetics, Feuk et al. 2006) • Chromosomal Copy Number • It refers to DNA Copy Number in most publications.
Why Study Copy Number ? “ Chromosomal copy number alterations can lead to activation of oncogenes and inactivation of tumor suppressor genes (TSGs) in human cancers. … identification of cancer-specific copy number alterations will not only provide new insight into understanding the molecular basis of tumorigenesis but will also facilitate the discovery of new TSGs and oncogenes.”
Normal cell CN=2 Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 DNA Copy Number Changes in Tumor Cells
Why Use SNP Arrays ? • CGH Array CGH: Comparative genomic hybridization “Array-based CGH makes it possible to scan the genome for copy number with high resolution by hybridizing to arrayed genomic DNA or cDNA clones. …However, currently available array CGH methods cannot simultaneously detect chromosomal loss of heterozygosity (LOH). “ • SNP Array “… to combine the detection of cancer copy number with cancer-specific LOH in the same experiments, we have developed an analytical method to detect DNA copy number changes by hybridization of representations of genomic DNA to commercially available single nucleotide polymorphism (SNP) arrays.” Simultanously detect DNA copy number changes and phenotype changes (LOH) in tumor cells
Materials & Methods 5 samples for validation, with known copy numbers of chromosome X (1,2,3,4,5 copies of chrom. X) 2 diploid cell lines containing cytogenetically mapped partial or whole-chromosome copy number gains or losses. 18 lung and breast cancer cell lines 15 normal blood control cell lines Affymetrix XbaI mapping array 130 (10,043 SNPs) Chip scanning and image processing by MAS 5.0 Intensity normalization and summarization Raw/observed copy numbers of cancer samples Segmentation and copy number estimation (Hidden Markov Model, HMM)
Normalization & Summarization • Normalization (reducing technical variation between chips, making intensities from different chips comparable) - Base Line Array Method • Summarization (combining the multiple probe intensities for each SNP to produce a summarized signal value for each SNP) Perfect Match: pm = pmA + pmB Mismach: mm = mmA + mmB Model based summarization pm/mm difference multiplicative model (Li & Wong , 2001)
For each SNP of each cancer sample observed signal Observed CN = x 2 mean signal of two copy normal samples Log2 Transformed Intensities and Raw CNs Black: Normal, Red: Tumor, Green: Tumor/Normal Observed/Raw Copy Number Data
CN=4 CN=3 CN=2 CN=1 Segmentation & Estimation
… SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … CN=? CN=? CN=? CN=? CN=? Obs. CN Obs. CN Obs. CN Obs. CN Obs. CN CN Estimation: Hidden Markov Model (HMM)CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp) SNP Hidden status (unknown CN ) Observed status (observed/raw CN) CN estimation:finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=0,1,2,3,… n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=0,1,2,3,… n; y=0,1,2,3, … n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P( observed CN | CN=y) y=0,1,2,3, …n
Prior Information for HMM • Background Probabilities (B) • Overall probabilities of possible CN values. • P(CN=2)=0.9 • P(CN=i)=0.1/(N-1), i=0,1,3,4,…,N; N=max CN allowed. e.g. P(CN=i)=0.01 when N=11 • Transition Probabilities (T) • Probabilities of CN values of each SNP conditional on the previous one. • P(CN_i+1=x|CN_i=y); x=-0,1,2,3,… n; y=0,1,2,3, … n • Genetic distance (Haldane map funtion) • Emission Probabilities (E) • Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. • Signal |CN ~ t distribution with df=40 • Max Liklihood (Observed CN | B, T, E); Interative 0 1 2 3 … n 0 p00 p01 p02 p03 … p0n 1 p10 p11 p12 p13 … p1n 2 p20 p21 p22 p23 … p2n 3 p30 p31 p32 p33 … p0n … n pn0 pn1 pn2 pn3 … pnn
Errors of HMM (1-99.2%=0.8%) “… our criteria for homozygous deletion require the presence of at least 2 SNPs that cover an area of 1 kb in addition to an inferred copy number of 0 …”
HMM CN estimation for the samples with known loss or gain regions
Disadvantages of HMM • With no significance test • Intense computation • Individual level analysis
Software Affymetrix Chips (www.affymetrix.com) Illumina Chips (www.illumina.com) CNAT(www.affymetrix.com) dChip (www.dchip.org) CNAG (www.genome.umin.jp) GenePattern www.broad.mit.edu/cancer/software/genepattern/ BioConductor R Packages (www.bioconductor.org) GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method
References • JL Freeman et al. Genome Research 2006; 16:949-961 • J Huang et al. Hum Genomics. 2004;1(4):287-99 • X Zhao et al. Cancer Research 2004; 64:3060-3071 • Y Nannya et al. Cancer Research 2005, 65: 6071-6079 • … see google …
… .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … .. Window k Window N Window 10 Window 9 Window 6 Window 8 Window 4 Window 3 Window 2 Window 1 Window 7 Window 5 ……….. ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29) Sliding Window Analysis
Genome-wide Raw Copy Number Changes(sliding window plot, averaged over ~400 pairs )
Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs
CN Change Frequencies in Population( Chr.14,~400 pairs)Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)
Microarray: From Image to Copy Number Tumor Normal Affymetrix Mapping 250K Sty-I chip ~250K probe sets ~250K SNPs probe set (24 probes) CN=2 CN=2 CN=2 Deletion CN=1 CN=0 CN>2 Deletion Amplification more DNA copy number more DNA hybridization higher intensity
Finished chips (scanner) Raw image data [.DAT files] (experiment info [ .EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis chip description file [.CDF] Preprocessing : • General Procedures for Copy Number Analysis