90 likes | 297 Views
Detection and Compensation of Cross-Hybridization in DNA Microarray Data. Jim Huang (1) ,. Joint work with Quaid Morris (1) , Tim Hughes (2) and Brendan Frey (1). Probabilistic and Statistical Inference Group, University of Toronto
E N D
Detection and Compensation of Cross-Hybridization in DNA Microarray Data Jim Huang(1), Joint work with Quaid Morris(1), Tim Hughes(2) and Brendan Frey(1) • Probabilistic and Statistical Inference Group, University of Toronto • (2) Banting & Best Department of Medical Research, University of Toronto
Description and Applications of DNA Microarrays • Microarrays consist of a 2-D array of probes, each with a short DNA sequence attached. These sequences are called oligonucleotide sequences. • The output of each probe is approximately proportional to the amount of DNA that binds to the probe from a given tissue; the data for each probe is an N-dimensional expression profile vector, where N is the number of tissues used on the array. • DNA microarrays can be used to measure the level of gene expression across these N tissues.
Hybridization and cross-hybridization DNA from tissue sample • The process of 2 complementary DNA strands binding is called hybridization; • Ideally, an oligonucleotide probe will only bind to the DNA sequence for which it was designed and to which it is complementary; • However, many DNA sequences are similar to one another and can bind to other probes on the array; • This phenomenon is called cross-hybridization; ATCTAGAAT TCGAT CCTA AGCTAGGAT TCGAT CCTA Cross-hybridization Hybridization Oligonucleotide Probe
The trouble with cross-hybridization • With cross-hybridization, each probe will signal the presence of multiple sequences other than that it was designed for; • This skews the observed data from the expected data. = + Observed expression profile vector (cross-hybridized) Expected expression profile vector (no hybridization)
Detecting cross-hybridization (1) • To test for whether cross-hybridization is impacting the gene expression data, we perform a BLAST sequence match on all oligonucleotide probe sequences used on the microarray; • Many probes will be matched with sequences for which it wasn’t specifically designed.
Detecting cross-hybridization (2) • We compute the Pearson correlation coefficient ρ between matched probe sequence expression profiles and between the profiles of randomly-paired probes; • Approximately 33% of the BLAST-matched probes have ρ > 0.95, whereas only 2% of randomly-matched probes have ρ >0.95; • This difference in the 2 distributions indicates that cross-hybridization indeed has a significant impact on the observed gene expression data.
Compensating for cross-hybridization • We model the observed, cross-hybridized expression profile vector x as a matrix product of a hybridization matrix Λ and an unobserved expression profile vector z in which there is no cross-hybridization. • The elements λijof the Λ matrix are set as parameterized functions of the Gibbs free energy ΔGij between probes i and j. • To compensate for cross-hybridization, we use a generalized Expectation-Maximization algorithm in which we solve for z and Λ iteratively.