190 likes | 350 Views
Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation Xingyuan Li, Zhili He1 and Jizhong Zhou1. 6114–6123 Nucleic Acids Research, 2005, Vol. 33, No. 19. Presented by: Deepti Malhotra Biological Sequence Analysis.
E N D
Selection of optimal oligonucleotide probes formicroarrays using multiple criteria, globalalignment and parameter estimationXingyuan Li, Zhili He1 and Jizhong Zhou1.6114–6123 Nucleic Acids Research, 2005, Vol. 33, No. 19. Presented by: Deepti Malhotra Biological Sequence Analysis
MICROARRAY - What is it? Analysis of the relative expression level of hundreds or thousands of genes simultaneously by determining the amount of messenger RNA (mRNA) that is present in a single experiment. Labeled Target Probe (gene of interest) matrix
cDNA Microarray: NIEHS Tox Chip Nuwaysir E, et al., Molecular Carcinogenesis 24:153-159 (1999)
* * * * * GeneChip® Probe Arrays Hybridized Probe Cell GeneChipProbe Array Single stranded, fluorescently labeled DNA target Oligonucleotide probe 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe 1.28cm Over 200,000 different probes complementary to genetic information of interest Courtesy: Affymetrix Image of Hybridized Probe Array
* * * * * GeneChipProbe Arrays GeneChipProbe Array Probe Pair Probe Set PM MM Hybridized Probe Cell Probe Cell (feature) Image of Hybridized Probe Array
Multiple Specific Probe Pairs per Gene (25-mers) (25-mer) nature genetics supplement • volume 21 • january 1999
What’s the complexity? • More genes • More information per experiment Feature Size Features/Chip Genes/Chip* 100 µm 50 µm 20 µm 10 µm 16,384 65,538 409,600 1,638,400 409 1,638 10,240 40,960 * Using 20 probe pairs per gene
Why So Many Probe Pairs? Probe Pairs • Point Mutations, Deletions, or Insertions will not effect the detection of the gene of interest. • Bioinformatics algorithm will account for expression across 11 different probe pairs to calculate expression of gene. Gene of Interest
Redundancy of probe synthesis • Multiple Indicators for the Same Gene Ensures: • Quantitative accuracy • High sensitivity • Indicators of oligonucleotide Specificity: • Sequence identity to non-targets • Continuous stretch to non-targets • Free energy of Binding to the non-targets All these 3 criteria important for the selection of optimal probes
Problems with probe synthesis – addressed by CommOligo • Representation of each sequence in a genome wide search • Liberal cut-offs and fewer non specifics • Generally use BLAST for local alignment or Suffix arrays for exact string search • Homologous sequence studies versus whole genome arrays Applicability to experiments • Experimental threshold determination • Inherent variability
Series of filters checking Oligos Cut offs based on CommOligo_PE Parameters and thresholds are user adjustable Iterative probe optimization All 3 criteria’s included
Sequence alignment strategy Dynamic Programming Matrix • Uses bit scores from Myers algorithm during identity calculation • An alignment corresponds to the path from bottom row with high identity/ score to the top row. • Traverse path/ last path
Final optimization and scoring • Quality score is calculated as: • CommOligo_PE used to determine the thresholds and the probes are optimized for maximum coverage and correctness by calculating: • The goal is to maximize NPV and C • Cross validation by dividing into subsets of 10 randomly and using one as a test calibration is run 10 times.
Results Training sets:
Genome wide analysis Homologous sequence searches
Take home message • CommOligo works well with Homologous sequences 3 stringent criteria's cDNA • Still works well at the same thresholds for genome wide searches Oligochip • Actual hybridization data is used • Better identity and minimum energy filters • Optimal Tm for the hybridization reaction is based on the oligos selected after having passed all the filters and not all the possible oligos • Iterative threshold optimization