150 likes | 167 Views
This project explores the design of primers for cancer genomics research, focusing on optimizing PCR amplification to detect genetic abnormalities such as deletions and rearrangements. The study delves into strategies for detecting rare variants and developing algorithms to optimize primer interactions and coverage while considering dimerization constraints. Simulated annealing and ILP formulations are discussed, aiming to improve cost functions and achieve efficient primer selection. The research suggests potential improvements and discusses the complexity of integer variables in primer design optimization.
E N D
Project: Primer design for cancer genomics Stefano/Hossein
Cancer genomics • In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes • In the early stages, only a few cells will show this deletion Stefano/Hossein
Polymerase Chain Reaction • PCR is a technique for amplifying and detecting a specific portion of the genome • Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb) Stefano/Hossein
Assaying for Rare Variants • PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Detection PCR Extract Genomic DNA Distance too large for amplification Tumor cell Stefano/Hossein
Primer Approximation Multiplex PCR (PAMP)* • Multiple primers are optimally spaced, flanking a breakpoint of interest • Upstream of breakpoint, forward primers • Downstream of breakpoint, reverse primers • The primers are run in a multiplex PCR reaction • Any pair can form a viable product Patient B Patient C Deletion Deletion Stefano/Hossein
Goal • Input, a collection of primer locations and matrices of primer interactions • Forward/Forward, Forward/Reverse, Reverse/Reverse • Identify a subset of primers that do not interact, are unique, maximizing the covered region Stefano/Hossein
Algorithms for Optimizing the Cost • Preprocessing • Determining the pairs of primers that dimerize (Edges in the graph) • Filtering the primers to ensure “uniqueness” • Simulated annealing • Start from an initial candidate set P, generated randomly or greedily. • List the neighboring setsP’and compute • Select step s with a probability proportional to • Decrease the temperature T and go to step 2. Stefano/Hossein
Cost Function • The cost function used takes coverage and dimerization into account Coverage Density Dimerization Stefano/Hossein
Simulated Annealing: Define Neighbors • Approach 1: • Set • E is the edge set corresponding to dimerizing pairs • Neighbors of P are formed by adding a vertex u to P and removing all vertices dimerizing with u; i.e. • Approach 2: • No hard constraint on dimerizing pairs. • Neighbors of P are obtained by adding or removing one vertex from P. Stefano/Hossein
ILP Formulation : indicator of primer i being selected. : indicator of candidate primer i being immediately after primer j. • Guaranteed optimality, but intractable for realistic problems • Used here to assess the performance of simulated annealing Stefano/Hossein
Bounds and Numerical Results • A Weak Theoretical Upper Bound: • Select all primers without dimerization constraints. • For any two adjacent primers with distance reduce the covered region by bp. Stefano/Hossein
Potential Improvements • Improving the cost function formulation • Incorporating multiplexing sets • Find an efficient technique to solve the optimization problem. • Improve on the analytical bound • consider the effect of dimerization within the forward/reverse primer set. Stefano/Hossein
Pairwise cost function • Measures total possible number of sites that are uncovered given all forward and reverse primer combinations
Multiobjective cost function • Taking coverage and multiplexing sets into account • Minimizing both objectives, and resolving the dimerization constraint, given a possible solution containing mutliplexing sets S Sets Missed coverage
Using Fewer Integer Variables The formulation in the paper uses n2 auxiliary variables, one for each pair of primers. qij=1 if and only if primers i and j are selected as two consecutive primers in the candidate set. Complexity of ILP (or IQP) generally grows exponentially with the number of integer variables. In practice, the distance between two consecutive primers in the solution is not much larger than d, otherwise there would be a large gap in the covered region. Assume a maximum g on the maximum distance Introduce a variable qij if li – lj < g The average number of variables is reduce to n(1+ρg) ρ is the density of the primers in the initial set. The number of integer variables becomes O(n).