170 likes | 276 Views
Real-Time Primer Design for DNA Chips. Annie Hui CMSC 838 Presentation. Use of primers in PCR and Microarrays. PCR (polymerase chain reaction: to amplify a particular DNA fragment Use: to test for the presence of nucleotide sequences. Test of PCR products:.
E N D
Real-Time Primer Design for DNA Chips Annie Hui CMSC 838 Presentation
Use of primers in PCR and Microarrays • PCR (polymerase chain reaction: • to amplify a particular DNA fragment • Use: to test for the presence of nucleotide sequences • Test of PCR products: Ladder: a mixture of fragments of known length Lane 1 : PCR fragment is ~1850 bases long. Lane 2 and 4 : the fragments are ~ 800 bases long. Lane 3 : no product is formed, so the PCR failed. Lane 5 : multiple bands are formed because one of the primers fits on different places. CMSC 838T – Presentation
fluorescence Bound to primer Fixed on chip Use of primers in PCR and Microarrays • DNA chips (Microarrays): • to analyse a large number of genes in parallel. • Primers: • 20 to 100 bases long • Synthetically manufactured • Automated design of primer • A computational approach • Objective: To find primers that bind well without self-hybridizing • Critique: how accurate? CMSC 838T – Presentation
This group uses the automated NucliSens extraction system (bioMerieux) to develop their primers here. Motivation: CMSC 838T – Presentation
Technique: The computational model • Select primers from target sequence • two primers P (forward) and Q (reverse) for PCR, one primer for DNA chip (microarray) • Using window size W, number of possible primers with length between m and n within 1 window is: CMSC 838T – Presentation
We are starting here Technique: The computational model • For each primer pair, or single primer, Quantify 4 hybridization conditions: • Primer length • Melting temperature • GC content • Secondary structure • Self annealing • Self end annealing • Pair annealing • Pair end annealing CMSC 838T – Presentation
Technique: quantifying hybridization conditions • Primer length len(P) • Affect melting temperature and hybridization • Melting temperature Tm(P) • Temperature at which the bonds between primer and gene sequence break • CG content CG(P) • G-C pairs are more stable than A-T pairs (because of more H-bonds) What is this measure good for? CMSC 838T – Presentation
Technique: quantifying hybridization conditions • Secondary structure • Study how likely a primer entangles with itself or with another primer • P = {p1, p2, …, pn}, Q = {q1, q2, …, qm}, • Scoring function: • S(pi, qj) = 2 if {pi, qj} = {A, T} = 4 if {pi, qj} = {C, G} = 0 otherwise Example: P: ...AGCTTTAGCCATAG Q: TCTTAGGATCGC... score S(pi, q1) = 2+4+2+2+4 = 14 Position i of primer P CMSC 838T – Presentation
P’ P’ P’ P’ P P’ P’ P’ P’ P’ P’ P P’ Technique: quantifying hybridization conditions • Four measures of secondary structure: • Self annealing, SA(P, P’) • P’ = reverse of P • Self end annealing, SEA(P, P’) • Like Self annealing • k>=0 • Only count longest continuous overlaps • Pair annealing, PA(P, Q) • P and Q are the forward and reverse primers • Pair end annealing, PEA(P, Q) • similar to self end annealing CMSC 838T – Presentation
Technique: How to apply the model • For PCR: • P is forward primer, Q is reverse primer • Ideally, no annealing, length, GC and temp of P equals Q • The optimization is: • For DNA chips (Microarrays): • Q doesn’t exist. No pair annealing to study. Only 5 terms left. CMSC 838T – Presentation
Technique: parallelize SCPCR(p,q) calculation Compute PA and PEA in parallel Calculate Len, GC, Temp, SA and SEA in parallel CMSC 838T – Presentation
c b cd bd ce a cf ad be d ae bf e af f Technique: details • Melting temperature and CG content: • Simple adder+divider • Use pipelining • 1st one: O(m) • Subsequent cost: O(1) • Annealing matrix • Whole window: AGCGATATA • i-th P primer: GCGATA • (i+I)-th P primer: CGATAT • CG(Pi+1) = CG(Pi) - 1 • H(Pi+1) = H(Pi) - H(GC) + H(AT), • similar for S CMSC 838T – Presentation
Complexity • Complexity for sequential algorithm: • For PCR: • Number of choices of P (window size=Wp): • Number of choices of Q (window size=Wq): • Each distance SCPCR(P,Q): • Total: • Complexity for parallel algorithm: • For PCR: • Distance measure SCPCR(P, Q) = O(1) • Total: O(S*T) Similar but simpler for Microarray O(S*S*T*T) is a typo in the paper CMSC 838T – Presentation
Evaluation • Experimental environment • 512 primer pairs, |Wp| = |Wq| = 16 • 500MHz Celeron system with integrated hardware accelerator • Software implementation • Evaluation results • 1920 secs for software implementation • 3.41 secs for using hardware accelerator CMSC 838T – Presentation
Related Work • Previous approach • DOPRIMER • Same computational model • Differ in the way of doing dynamic programming • Sequential in nature • Other Primer selection softwares • Eg: Primer Premier 5, Primer3, PrimerGen, PrimerDesign • Similarities: • Criteria: Length, Temp range, GC range, GC Clamp, 3’ end stability, uniqueness of 3’ end base, Dimer/hairpins, Degeneracy, Salt concentration, Annealing Oligo Concentration, etc • Differences: • Not a weighed linear sum of all criteria • Need much expert’s supervision, • the numerical criteria are used as a guide only CMSC 838T – Presentation
More Related Works • Case study • Burpo did a critical review of PCR primer design algorithms • Subject: saccharomyces cerevisiae deletion strains • Conclusion: • no suitable program for the task of post-design PCR analysis • Especially in the aspect of accurately predicting non-specific hybridization events that impair PCR amplification. CMSC 838T – Presentation
Observations • My observations: • Minus side: • Is the computational model too simplistic? • Specifically, is a weighed linear sum justified? • Plus side: • The design of the parallel architecture is neat. • Since primers are about the length of 18-22 bases, current technology certainly can handle it. • When would you need fast primer selection? • Primer walking to connect contigs together quickly • To scan through a large number of sequences for possible primers CMSC 838T – Presentation