200 likes | 301 Views
Input Sensitive Algorithms for Multiple Sequence Alignment. Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford. Multiple Sequence Alignment. Quantifies similarities among [DNA, Protein] sequences Detects highly conserved motifs & remote homologues
E N D
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford
Multiple Sequence Alignment • Quantifies similarities among [DNA, Protein] sequences • Detects highly conserved motifs & remote homologues • Evolutionary insights • Transfer of annotation • Representation of protein families
(1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------ Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Gap infused sequences (-), one per row. • Restrictions column pattern
Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Minimal width • Score function • Columns summation • e.g. sum of pairs (1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------
GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL num of nodes num neighbors per node DP solves MSA • Build a score matrix • k-dimensional hypercube • An alignment is a path • Time: GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL
GARFIELDANDHISASSOCIATENERMAL Pairwise Restriction • The “true” information: the aligned subsequences and their relative positioning • Study pairwise alignment first and restrict the alignment • Time: • Focus efforts on “true” tradeoffs GARFIELDMETNERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE nodes • Edges: • self edges • between 2-equal-lengths-segments of different sequences • have scores GARFIELD NERMAL ANDHISASSOCIATE GARFIELD MET NERMAL Segments Matching Graph (SMG) • Sequences are partitioned into segments Defines allowed paths and their score
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:
Lemma: there is an optimal path that is extreme Optimalpaths All paths Extreme paths
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments
Transitive PR-MSA DNA sequences *no scores in SMG, only matches
Maximal Directions • Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques • Defines maximal directions • The shortest path can be taken over maximal directions. • Pushes down the work per node
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE ? ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE GARFIELD GARFIELD MET MET NERMAL NERMAL Obvious Directions Obvious: Non-Obvious:
Obvious Directions • Lemma:Optimal pathis found, evenwhen making obvious decisions • Not all nodes are relevant • Work for every node increases to
Straightjunction Corner junction (0,0) Special Vertices