270 likes | 462 Views
BIO/CS 471 – Algorithms for Bioinformatics. Physical Mapping of DNA. Landmarks on the genome. Identify the order and/or location of sequence landmarks on the DNA. Restriction Enzyme Digests. Hybridization Mapping. Clones. BamH1 – GGATCC. y. x. z. w. Probes.
E N D
BIO/CS 471 – Algorithms for Bioinformatics Physical Mapping of DNA
Landmarks on the genome • Identify the order and/or location of sequence landmarks on the DNA Restriction Enzyme Digests Hybridization Mapping Clones BamH1 – GGATCC y x z w Probes Physical Mapping
Producing a map of the genome m u a e j i x f w z d u z d m Physical Mapping
Restriction Fragment Mapping 3 8 6 10 A: 4 5 11 7 B: 3 1 5 2 6 3 7 A + B: Physical Mapping
Set Partitioning • The Set Partition Problem • Input: X = {x1, x2, x3, … xn} • Output: Partition of X into Y and Z such thatY = Z • This problem is NP Complete • Suppose we have X = {3, 9, 6, 5, 1} • Can we recast the problem as a double digest problem? Physical Mapping
Reduction from Set Partition • X = {3, 9, 6, 5, 1} 1 3 5 6 9 12 12 1 3 5 6 9 9 3 5 6 1 12 12 9 3 5 6 1 Physical Mapping
NP-Completeness • Suppose we could solve the Double Digest Problem in polynomial time… Instance ofSet Partition Instance ofDouble Digest Convert* Solve in polynomial time *polynomial time conversion Physical Mapping
Hybridization Mapping • The sequence of the clones remains unknown • The relative order of the probes is identified • The sequence of the probes is known in advance Clones y x z w Probes Physical Mapping
Interval graph representation a b c d e Becomes a graph coloring problem, which is (you guessed it) NP-Complete b a c e d Physical Mapping
Simplifying Assumptions • Probes are unique • Hybridize only once along the target DNA • There are no errors • Every probe hybridizes at every possible position on every possible clone Physical Mapping
Consecutive Ones Problem (C1P) Rearrange the columns such that all the ones in every row are together: Probes Clones: Physical Mapping
An algorithm for C1P • Separate the rows (clones) into components • Permute the components • Merge the permuted components S1 = {1, 2, 4, 5, 7, 9} S2 = {2, 3, 4, 5, 6, 7, 8, 9} Physical Mapping
Partitioning clones into components • Component graph Gc • Nodes correspond to clones • Connect li and lj iff: Physical Mapping
Component Graph l2 l3 a b l1 l4 g l8 l5 d Here, connected components are labeled with greek letters. l6 l7 Physical Mapping
Assembling a component • Consider only row 1 of the following: • Placing all of the ones together, we can place columns 2, 7, and 8 in any order l2 l1 l3 {2, 7, 8} {2, 7, 8} {2, 7, 8} l1 … 0 1 1 1 0 … Physical Mapping
Row 2 • Because of the way we have constructed the component, l2 will have some columns with 1’s where l1 has 1’s, and some where l2 does not. • Shall we place the new 1’s to the right or left? • Doesn’t matter because the reverse permutation is the same answer. Physical Mapping
Adding row 2 • Placing column 5 to the left partially resolves the {2, 7, 8} columns S1 = {2, 7, 8} S2 = {2, 5, 7} {5} {2, 7} {2, 7} {8} l1 … 0 0 1 1 1 0 … l2 … 0 1 1 1 0 0 … Physical Mapping
Additional Rows • Select a new row k from the component such that edges (i, j) and (i, k) exist for two already added rows i, and k. • Look at the relationship between i and k, and between i and j to determine if k goes on the same side or the opposite side of j as i. k i j Physical Mapping
Definitions • Let • Place k on the same side as i if • Else, place on the opposite side k i j Physical Mapping
Placing Rows Place k on the same side as i if • Or: i l2 j i l1 l3 k So we place l3 on the opposite side of l2 (relative to l1), which is the right. Physical Mapping
Placing rows • Repeat for every row in the component {5} {2} {7} {8} {1,4} {1,4} l1 … 0 0 1 1 1 0 0 0 … l2 … 0 1 1 1 0 0 0 0 … l3 … 0 0 0 1 1 1 1 0 … Physical Mapping
Joining Components • New graph: Gm – the merge graph (directed) • Nodes are connected components of Gc • Edge (, ) iff every set in is a subset of a set in l2 a l3 b l1 Physical Mapping
Constructing Gm l2 l3 a b l1 l4 g l8 l5 d l6 l7 a b d g Physical Mapping
All rows in g will share the same disjoint/subset relationship with each row of a Different components disjoint or subset Same component shares a 1 column That column matches in a row of a, then subset, else disjoint Properties of Gm a b d g Physical Mapping
Ordering components • Vertices without incoming edges: freeze their columns • Process the rest in topological order. • is a singleton and a subset a b d g Physical Mapping
Ordering Components (2) • Find the leftmost column with a 1 • In the current assembly, find the rows that contain all ones for that column • Merge the columns Physical Mapping
Ordering Components (3) Physical Mapping