370 likes | 384 Views
Explore advancements in universal DNA tag arrays for versatile genomic analysis. Learn about tag set design, tag assignment, and hybridization models. Discover the advantages of cost-effectiveness, customization, and reliability.
E N D
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department University of Connecticut
Overview • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions
Watson-Crick Complementarity • Four nucleotide types: A,C,T,G • A’s paired with T’s (2 hydrogen bonds) • C’s paired with G’s (3 hydrogen bonds)
Microarray Applications • Gene expression (transcription analysis) • Genomic-based microorganism identification • Single Nucleotide Polymorphism (SNP) genotyping • Molecular diagnosis/susceptibility to disease
Microarray Technologies • Arrays of cDNAs • Obtained by reverse transcription from Expressed Sequence Tags (ESTs) • Oligonucleotide arrays • Short (20-60bp) synthetic DNA strands • Limitations • cDNA arrays are inexpensive, but limited to measuring gene expression • Oligonucleotide arrays are flexible, but expensive unless produced in large quantities
Universal DNA Tag Arrays • [Brenner 97, Morris et al. 98] “Programmable” oligonucleotide arrays • Array consisting of application independent oligonucleotides called tags • Two-part “reporter” probes: aplication specific primers ligated to antitags • Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes
+ Universal Tag Array Experiment
Universal Tag Array Advantages • Cost effective • Same array used in many analyses economies of scale • Easy to customize • Only need to synthesize new set of reporter probes • Reliable • Solution phase hybridization better understood than hybridization on solid support
Overview • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions
t1 t2 t1 t2 t1 t1 t2 Tag Set Requirements • Hybridization constraints (H1) Antitags hybridize strongly to complementary tags (H2) No antitag hybridezes to a non-complementary tag (H3) Antitags do not cross-hybridize to each other
Complementary Oligo Hybridization • Melting temperature Tm: temperature at which 50% of duplexes are in hybridized state • 2-4 rule Tm = 2 #(As and Ts) + 4 #(Cs and Gs) • More accurate models exist, e.g., the near-neighbor model
Non-Complementary Oligo Hybridization Models • Hamming distance • Assumes DNA is rigid strand • Longest Common Subsequence/Edit Distance • Assumes DNA is infinitely elastic • Near-neighbor with mismatches • Resulting thermodynamic alignment problem is NP-Hard
C-token Hybridization Model • Proposed by [Ben-Dor et al. 00] • Based on nucleation complex theory • Duplex formation between non-complementary oligos starts with the formation of a nucleation complex between perfectly complementary substrings • The nucleation complex must be sufficiently stable, i.e., must have weight c, where weight(A)=weight(T)=1, weight(C)=weight(G)=2
c-h Code Problem • c-token:left-minimal DNA string of weight c, i.e., • w(x) c • w(x’) < c for every proper suffix x’ of x • A set of tags is called c-h code if (C1) Every tag has weight h (C2) Every c-token is used at most once • c-h code problem:given c and h, find largest c-h code • [Ben-Dor et al.00] algorithm based on DeBruijn sequences
Extended c-h Codes • Antitag-to-antitag hybridization • Formalization in c-token hybridization model: (C3) No two tags contain complementary substrings of weight c • Tag length constraints • Industry designs (e.g. Affymetrix GenFlex Arrays) require that tags have fixed length (20 nucleotides)
c-Token Counting • W=weight 1 nucleotide, S=weight 2 nucleotide • Gn = #strings of weight n:G1 = 2; G2 = 6; Gn = 2Gn-2 + 2Gn-1 • [Ben-Dor et al.00] c-tokens in a tag have tail weight h-c+1 At most (2Gc-1 +6Gc-2+8Gc-3)/(h-c+1) tags in a valid c-h code
Upperbounds for Extended c-h Codes • Theorem: The number of tags in a feasible extended c-h-l code is at most for odd c, and at most for even c
Algorithms • Alphabetic tree search algorithm • Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags • Easily modified to handle various combinations of constraints • [MT 05] Optimum c-h codes can be computed in practical time for small values of c by using integer programming • Maximum integer flow problem w/ set capacity constraints
Periodic Tags • c-token uniqueness constraint in c-h code formulation is too strong • c-tokens should not appear in different tags, but is OK to repeat a c-token in the same tag • Periodic tags make best use of available c-tokens • [MT05] Tag set design can be cast as a maximum vertex-disjoint cycle packing problem • Allowing periodic tags yields ~40% more tags
c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT
Overview • Background on DNA Microarrays • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions
More Possible Mis-Hybridizations • What can be done: • Leave some tags unassigned • Distribute primers over multiple arrays • Here we focus on avoiding case (a), primer-to-tag hybridization
Constraints on tag assignment • If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p’ t p t’
Characterization of Assignable Sets • [Ben-Dor 04] Set P is assignable to T iff X+Y |P|, where, in the hybridization graph induced by P+T • X = number of primers incident to a degree 1 tag • Y = number of degree 0 tags X=1 Y=2
MAPS Problem • Maximum Assignable Primer Set (MAPS) Problem: given primer set P and tag set T, find maximum size assignable subset of P • [Ben-Dor 04] Greedy deletion heuristic: repeatedly delete primer of maximum weight from P until it becomes assignable, where • Potential of tag t is 2-|P(t)| • Potential of primer p is sum of potentials of conflicting tags
Universal Array Multiplexing Problem • Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets • [Ben-Dor 04] Repeatedly find approximate MAPS
Integration with Probe Selection • In practice, several primer candidates with equivalent functionality • In SNP genotyping, can pick primer from either forward and reverse strand • In gene expression/identification applications, many primers have desired length, Tm, etc.
Pooled Array Multiplexing Problem • Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets
Pooled Multiplexing Algorithms • Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]
Pooled Multiplexing Algorithms • Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04] • Primer-Del+ = same, but never delete last primer from pool unless no other choice • Min-Pot = select primer with min potential from each pool, then run Primer-Del • Min-Deg = select primer with min conflict degree, then run Primer-Del
Herpes B Gene Expression Assay GenFlex Tags Periodic Tags
Overview • Background on DNA Microarrays • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions
Conclusions • New techniques for tag set design and tag assignment lead to significantly improved multiplexing rates and more reliable assays • Other applications of universal tags: • Lab-on-chip, DNA-mediated assembly (e.g., carbon nano-tubes), DNA computing [Brenneman&Condon 02] • Other applications of partition/assignment techniques: • Genotyping by mass-spectroscopy [Aumann et al 05] • Genotyping using l-mer arrays
Acknowledgments • UCONN Research Foundation