360 likes | 509 Views
Microarray Synthesis through Multiple-Use PCR Primer Design. Research Proficiency Examination Rohan Fernandes. Biology Background:PCR. PCR animation (From the Dolan DNA Learning Center, CSHL) Applications of PCR include Genetic Fingerprinting. Medical Diagnostics. DNA Sequencing.
E N D
Microarray Synthesis through Multiple-Use PCR Primer Design Research Proficiency Examination Rohan Fernandes
Biology Background:PCR • PCR animation (From the Dolan DNA Learning Center, CSHL) • Applications of PCR include • Genetic Fingerprinting. • Medical Diagnostics. • DNA Sequencing.
What are Microarrays? • A grid with different DNA probes in each location. • Allows one to test a given sample for expression of multiple genes. • Can compare gene expression by using different colored fluorescent markers in two samples.
Genomic Data • Sequences are known for more than 800 organisms! • 100 free-living species have been sequenced already. • But we know very little about most of these organisms’ biology. • Exploiting full-genome sequence data, requires investigators to have inexpensive custom microarrays.
Why Microarrays? • Microarray technology has revolutionized our understanding of gene expression. • Applications include • Cell cycle analysis. • Response of cells to environmental stress. • Impact of gene knockouts.
A Primer Design True Story!! • Project for Futcher and Leatherwood to design PCR primers for microarray synthesis. • Strict criteria for primer length, melting temperature, self-similarity were specified. • Designed primers for 5827 and 5012 genes for Cerevisiae and Pombe. • PCR done with sample set of primers designed for 96 genes each of S. Pombe and S. Cerevisiae was 100% successful.
The 110,000 Dollar Problem • Good primer design can be crucial in synthesizing microarray DNA. • $110,000 out of a total budget of $220,000 for microarray synthesis was spent on PCR primers alone. • We propose an alternative method of PCR primer design to reduce costs.
Efficiency of PCR • Usually, PCR primers are designed to occurs uniquely on the genome. • However, efficiency of PCR falls exponentially as length of product increases. • PCR becomes ineffective for product sizes beyond 1200 bases.
Exploiting PCR Efficiency Drop-off • Amplification is significant only if primers hybridize near each other. • We can reuse primers to amplify several genes, provided each primer pair is unique. • We can save thousands of primers through reuse!
Who can benefit? • The total cost of PCR primers may dissuade investigators of less studied organisms from using microarrays. • Our technique can reduce costs enough to make microarrays more attractive to less funded researchers.
What is the potential win? • Let (n,m) be the (number of genes, minimum number of primers required to amplify them). • m primers can result in m(m+1)/2 unique primer pairs. • 2n primers may be sufficient instead of 2n. • Conventional primer design requires 12,000 primers for 6,000 genes, but 110might suffice. • In practice this lower bound will be unreachable but there will still be a large win.
Potential Win? (Example) • Consider the cost of building a spotted microarray for a 20,000 gene organism. • Conventional techniques will require us to use 40,000 primers. • Cost : $160,000 at $4 a primer. • If 3,000 primers suffice, cost is only $12,000. • The best case is overoptimistic, but realistic wins are still impressive.
Cost of Split Addressing • What is the probability that two random strings will occur in a long random string in a certain order and with no more than a certain gap?
Split Addressing – Conclusion • Total length of primers required to ensure uniqueness of hybridization increases only very slowly with the length of the genome. • The penalty for genome scale lengths and realistic PCR gap lengths amount to only additional 3-4 bases of primer over ungapped matching. • These results support the potential of multiple-use primers.
Hardness of problems • The Minimum Primer Set problem is NP-hard and hard to approximate to within a logarithmic factor. • The Budgeted Primer Set problem is NP-hard and seems to be related to densest k-subgraph problem. • Approximation bounds for densest k- subgraph problem are not encouraging.
Reduction from Set Cover to Minimum Primer Set • (S, X) is a set cover instance. • S U, X W. Connect vertex in U to vertex in W iff corresponding set in S contains element from X. • Label (color) each edge by the name of the element vertex at its end. • MPS solution will include all element vertices and minimum number of set vertices which cover all sets. Q.E.D.
A Heuristic to approximate MPS • Based on greedy heuristic to find densest subgraph. • Each edge is weighted with the value of (1/number of edges bearing that color). • Vertex weight is set to sum of adjoining edge weights. • Algorithm proceeds by removal of vertex with minimum weighted vertex without eliminating any color. • Algorithm terminates when no more vertices can be eliminated.
Example Run of Algorithm (1) • Initially graph with vertex weights.
After removing minimum weighted vertex. Example Run of Algorithm (2)
Example Run of Algorithm (3) • Final graph.
Performance of Heuristic • O(|V|.(|V|+|E|+|C|)) time and O(|V|+|E|+|C|) space. • This heuristic is too slow. It is quadratic in |V| hence very slow on large data sets. • For our largest dataset this heuristic produced a solution in two days as opposed to 25 minutes for the next heuristic.
A Linear-time Heuristic • We select an edge of each color that has maximum colored adjacency to form our seed graph. • We switch an edge for a color if that saves us any vertices in the seed graph • If there are no savings but no additional vertices we switch edges with p=1/2. • Repeat above steps until no. of vertices is constant. • Eliminate all colors whose edges are not isolated. • Repeat above steps for remaining graph until no. of vertices is constant. Merge graph obtained.
Preparation of Experimental Data Sets • Candidate primer sets for S. Cerevisiae and S. Pombe prepared using Primer3. • Primer length range 8-12 bases. • PCR product size range from 300-1200 bases. • For each gene at most 10,000 pairs of primers were selected. • Three melting temperature ranges for each of S. Cerevisiae and S. Pombe were selected.
Degenerate Data Sets • A degenerate primer is a mix of two or more primers usually differing in a small number of bases. • Degenerate primers can make resulting colored graph more dense by merging primers. • Created degenerate data sets by merging primers differing in at most one base.
Future Work • Using longer primers would enable more efficient PCR. • Increasing order of degeneracy would give a more dense colored graph and potentially greater savings. • Combining the above two ideas is the focus of our current work. • Consider the use of existing software architecture to solve other primer design problems.
Acknowledgements • Thanks to Steven Skiena, Bruce Futcher and Janet Leatherwood. • Sponsored by NSF Grant CCR-9988112.