390 likes | 503 Views
Engineering a Scalable Placement Heuristic for DNA Probe Arrays. A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU). Outline. DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement)
E N D
Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
DNA Probe Arrays • Used in wide range of genomic analyses • Gene expression monitoring, SNP mapping, sequencing by hybridization,… • Arrays with up to 1000x1000 probes in commercial use, 108 probes envisioned for next generation arrays • Highly scalable algorithms required for array design
This talk Simplified DNA Array Flow Probe Selection Probe Placement Probe Alignment (Mask Design) Mask Manufacturing Array Manufacturing Soft/Computational Domain Hybridization Experiment Analysis of Hybridization Intensities Hard/Biochemistry Domain Gene sequences, position of SNPs, etc.
Array Manufacturing Process Very Large-Scale Immobilized Polymer Synthesis: • Treat substrate with chemically protected “linker” molecules, creating rectangular array • Site size = approx. 10x10 microns • Selectively expose array sites to light • Light deprotects exposed molecules, activating further synthesis • Flush chip surface with solution of protected A,C,G,T • Binding occurs at previously deprotected sites • Repeat steps 2&3 until desired probes are synthesized
Photo-Deprotection Step Our concern: diffraction unwanted illumination yield decrease
AC G CG G M3 ACG AG AC AG C CG C M2 Placed probes Nucleotide deposition sequence ACG A A A A A C C A M1 C C C C G G G G G G Probe Synthesis
AC G CG G M3 ACG AG AC AG C CG border C M2 Placed probes Nucleotide deposition sequence ACG A A A A A C C A M1 C C C C G G G G G G Measuring Unwanted Illumination Unwanted illumination border length
4-group … T G G G C A T T G G C A T T T G C C C C A (a) (b) (c) (d) Synchronous vs. Asynchronous Synthesis (a) periodic deposition sequence (b) Synchronous embedding of CTG (c) Asynchronous leftmost embedding of CTG (d) Another asynchronous embedding
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
H G2 site probe Problem Formulation (Synchronous Case) Synchronous Array Design (2-D Placement) Problem: • Minimize placement cost of Hamming graph H (vertices = probes, distance = Hamming) • On 2-dimensional grid graph G2 (N x N array, edges b/w distance 1 neighbors)
H G2 probe 2-D Placement Lower Bound • Sum of Hamming distances to 4 closest neighbors minus weight of 4N heaviest arcs
TSP+1-Threading Placement Hubbell 90’s • Find TSP tour/path over given probes w.r.t. Hamming distance • Thread TSP path in the grid row by row Hannenhalli,Hubbell,Lipshutz, Pevzner’02 • Place the probes according to 1-Threading • Further decreases total border by 20%
1 2 3 A A C A T A T A T G C G C G G Radix-sort the probes in lexicographical order Thread on the chip Lexicographical Sorting +1-Threading
1 2 2 3 Re-embed using optimal perfect matching 3 2 5 1 4 4 Select an independent (mutually nonadjacent) set of placed probes Total cost can only decrease or remain the same Matching Based Probe Placement Runtime: roughly proportional to square of independent set size
Sliding Window Matching Iterate SlidingWindowMatching over the chip until improvement drops below 0.1% There is a trade-off between solution quality and size/overlap of windows
Effect of Window Size on Solution Quality Increased window size/overlap decreases number of conflicts, but increases runtime
Epitaxial Placement Algorithm • Simulates crystal-growth • Start with arbitrary probe placed at center • Maintain a best probe-candidate (i.e, a probe with min number of conflicts to the already placed neighbors) for each border site • Iteratively fill the border site with minimum increase in border length • - give priority to sites with more neighbors filled
Tile- and Row- Epitaxial • Tile-epitaxial • Divide array into 100x100 tiles • Run Epitaxial within each tile • Take into account border of already placed tiles • Row-epitaxial • Place probes by a fast method, e.g., sort+1-thread • Re-place probes row by row, sequentially filling sites within a row • Assign to each site a probe with min number of conflicts among the unplaced probes from following K rows
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
G2 H site probe Problem Formulation (Asynchronous Case) • Asynchronous synthesis: • Periodic nucleotide deposition sequence, e.g., (ACTG)p • Every probe grows asynchronously Border length = Hamming distance between embedded probes • Asynchronous Array (3-D Placement) Design Problem: • Minimize placement cost of embedded-probe Hamming graph H (vertices=probes, distance = Hamming b/w embedded probes) • on 2-dimensional grid graph G2 (N x N array, edges b/w neighbors)
Lower Bound • Sum of distances to 4 closest neighbors minus weight of 4N heaviest arcs • Distance between two probes of length p = 2p - |Longest Common Subsequence| • Non-tight bound: example with LB = 8 and best placement cost = 10 1 (c) AC GA 1 A A 1 1 1 1 G 1 G G CT TG Nucleotide deposition sequence S=ACTGA 1 T T T AC GA C C C CT TG A A Optimum placement
Source Sink Optimal Probe Alignment • Find best alignment of probe wrt embedded neighbors • Dynamic Programming: • Source-sink paths corresponds to feasible embeddings • O[(probe length) x (deposition sequence length)] • Can be extended to simultaneous alignment of two adjacent probes (2x1) with increase by O(probe length) A C G A C G T T A C T
3-D Placement Flows • Simultaneous placement and alignment • asynchronous epitaxial (slow and low quality) • Synchronous placement followed by in-place probe alignment (analogous to standard for VLSI flow partition) • using previous DP to do in-place probe alignment • Synchronous placement followed by probe alignment with reshuffle (analogous to feedback loops in VLSI flows) • asynchronous sliding window matching
Algorithms for In-Place Probe Alignment • Asynchronous re-embedding after 2-dim placement • Greedy Algorithm • While there exist probes to re-embed with gain • Optimally re-embed the probe with the largest gain • Batched greedy: speed-up by avoiding recalculations • Chessboard Algorithm • While there is gain • Re-embed probes in green sites • Re-embed probes in red sites
Comparison of In-Place Probe Alignments • Post-placement LB = sum of distances to adjacent probes • Distance between two probes of length p = 2p - |LCS | • Useful for assessing quality of algorithms that change probe embeddings but do not change probe placement
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions
Practical Extensions • Distant-dependent border conflict weights • Take into account conflicts between 2-,3-hop neighbors rather than only immediate neighbors • Position-dependent border conflict weights • In alignment DP for two sequences take into account importance of conflicts in the middle of probes – alignment cost has weights on conflicts which depend on conflict position • Polymorphic probes • Chip contains SNP’s, e.g. pairs of probes different in a single position – they should be placed together and alignment DP should align them simultaneously
This talk Simplified DNA Array Flow Probe Selection Probe Placement Probe Alignment (Mask Design) Mask Manufacturing Array Manufacturing Soft/Computational Domain Hybridization Experiment Analysis of Hybridization Intensities Hard/Biochemistry Domain Gene sequences, position of SNPs, etc.
Alignment DP for 2-SNP’s Optimal Embedding of A{C,T}T
Summary • Contributions: • Epitaxial placement reduces by extra 10% over the previously best known method • Asynchronous placement problem formulation • Postplacement improvement by extra 15.5-21.8% • Lower bounds • Scalable Placements (1000x1000 in 20min) • Ongoing work • Comparison on industrial benchmarks • Experiments with algorithms for extended formulations (SNPs, distance-dependent weights, etc.)