1 / 37

Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays

Explore advancements in universal DNA tag arrays for versatile genomic analysis. Learn about tag set design, tag assignment, and hybridization models. Discover the advantages of cost-effectiveness, customization, and reliability.

leonorm
Download Presentation

Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department University of Connecticut

  2. Overview • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions

  3. Watson-Crick Complementarity • Four nucleotide types: A,C,T,G • A’s paired with T’s (2 hydrogen bonds) • C’s paired with G’s (3 hydrogen bonds)

  4. Microarray Applications • Gene expression (transcription analysis) • Genomic-based microorganism identification • Single Nucleotide Polymorphism (SNP) genotyping • Molecular diagnosis/susceptibility to disease

  5. Microarray Technologies • Arrays of cDNAs • Obtained by reverse transcription from Expressed Sequence Tags (ESTs) • Oligonucleotide arrays • Short (20-60bp) synthetic DNA strands • Limitations • cDNA arrays are inexpensive, but limited to measuring gene expression • Oligonucleotide arrays are flexible, but expensive unless produced in large quantities

  6. Universal DNA Tag Arrays • [Brenner 97, Morris et al. 98] “Programmable” oligonucleotide arrays • Array consisting of application independent oligonucleotides called tags • Two-part “reporter” probes: aplication specific primers ligated to antitags • Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

  7. + Universal Tag Array Experiment

  8. Universal Tag Array Advantages • Cost effective • Same array used in many analyses  economies of scale • Easy to customize • Only need to synthesize new set of reporter probes • Reliable • Solution phase hybridization better understood than hybridization on solid support

  9. Overview • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions

  10. t1 t2 t1 t2 t1 t1 t2 Tag Set Requirements • Hybridization constraints (H1) Antitags hybridize strongly to complementary tags (H2) No antitag hybridezes to a non-complementary tag (H3) Antitags do not cross-hybridize to each other

  11. Complementary Oligo Hybridization • Melting temperature Tm: temperature at which 50% of duplexes are in hybridized state • 2-4 rule Tm = 2 #(As and Ts) + 4 #(Cs and Gs) • More accurate models exist, e.g., the near-neighbor model

  12. Non-Complementary Oligo Hybridization Models • Hamming distance • Assumes DNA is rigid strand • Longest Common Subsequence/Edit Distance • Assumes DNA is infinitely elastic • Near-neighbor with mismatches • Resulting thermodynamic alignment problem is NP-Hard

  13. C-token Hybridization Model • Proposed by [Ben-Dor et al. 00] • Based on nucleation complex theory • Duplex formation between non-complementary oligos starts with the formation of a nucleation complex between perfectly complementary substrings • The nucleation complex must be sufficiently stable, i.e., must have weight  c, where weight(A)=weight(T)=1, weight(C)=weight(G)=2

  14. c-h Code Problem • c-token:left-minimal DNA string of weight  c, i.e., • w(x)  c • w(x’) < c for every proper suffix x’ of x • A set of tags is called c-h code if (C1) Every tag has weight  h (C2) Every c-token is used at most once • c-h code problem:given c and h, find largest c-h code • [Ben-Dor et al.00] algorithm based on DeBruijn sequences

  15. Extended c-h Codes • Antitag-to-antitag hybridization • Formalization in c-token hybridization model: (C3) No two tags contain complementary substrings of weight  c • Tag length constraints • Industry designs (e.g. Affymetrix GenFlex Arrays) require that tags have fixed length (20 nucleotides)

  16. c-Token Counting • W=weight 1 nucleotide, S=weight 2 nucleotide • Gn = #strings of weight n:G1 = 2; G2 = 6; Gn = 2Gn-2 + 2Gn-1 • [Ben-Dor et al.00] c-tokens in a tag have tail weight h-c+1  At most (2Gc-1 +6Gc-2+8Gc-3)/(h-c+1) tags in a valid c-h code

  17. Upperbounds for Extended c-h Codes • Theorem: The number of tags in a feasible extended c-h-l code is at most for odd c, and at most for even c

  18. Algorithms • Alphabetic tree search algorithm • Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags • Easily modified to handle various combinations of constraints • [MT 05] Optimum c-h codes can be computed in practical time for small values of c by using integer programming • Maximum integer flow problem w/ set capacity constraints

  19. Experimental Results

  20. Periodic Tags • c-token uniqueness constraint in c-h code formulation is too strong • c-tokens should not appear in different tags, but is OK to repeat a c-token in the same tag • Periodic tags make best use of available c-tokens • [MT05] Tag set design can be cast as a maximum vertex-disjoint cycle packing problem • Allowing periodic tags yields ~40% more tags

  21. c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT

  22. Overview • Background on DNA Microarrays • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions

  23. More Possible Mis-Hybridizations • What can be done: • Leave some tags unassigned • Distribute primers over multiple arrays • Here we focus on avoiding case (a), primer-to-tag hybridization

  24. Constraints on tag assignment • If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p’ t p t’

  25. Characterization of Assignable Sets • [Ben-Dor 04] Set P is assignable to T iff X+Y  |P|, where, in the hybridization graph induced by P+T • X = number of primers incident to a degree 1 tag • Y = number of degree 0 tags X=1 Y=2

  26. MAPS Problem • Maximum Assignable Primer Set (MAPS) Problem: given primer set P and tag set T, find maximum size assignable subset of P • [Ben-Dor 04] Greedy deletion heuristic: repeatedly delete primer of maximum weight from P until it becomes assignable, where • Potential of tag t is 2-|P(t)| • Potential of primer p is sum of potentials of conflicting tags

  27. Universal Array Multiplexing Problem • Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets • [Ben-Dor 04] Repeatedly find approximate MAPS

  28. Integration with Probe Selection • In practice, several primer candidates with equivalent functionality • In SNP genotyping, can pick primer from either forward and reverse strand • In gene expression/identification applications, many primers have desired length, Tm, etc.

  29. Pooled Array Multiplexing Problem • Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets

  30. X+Y Characterization no Longer Holds

  31. Pooled Multiplexing Algorithms • Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]

  32. Pooled Multiplexing Algorithms • Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04] • Primer-Del+ = same, but never delete last primer from pool unless no other choice • Min-Pot = select primer with min potential from each pool, then run Primer-Del • Min-Deg = select primer with min conflict degree, then run Primer-Del

  33. Results: GenFlex Tags, c=8

  34. Herpes B Gene Expression Assay GenFlex Tags Periodic Tags

  35. Overview • Background on DNA Microarrays • Universal DNA Tag Arrays • Tag Set Design Problem • Tag Assignment Problem • Conclusions

  36. Conclusions • New techniques for tag set design and tag assignment lead to significantly improved multiplexing rates and more reliable assays • Other applications of universal tags: • Lab-on-chip, DNA-mediated assembly (e.g., carbon nano-tubes), DNA computing [Brenneman&Condon 02] • Other applications of partition/assignment techniques: • Genotyping by mass-spectroscopy [Aumann et al 05] • Genotyping using l-mer arrays

  37. Acknowledgments • UCONN Research Foundation

More Related