1 / 16

A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees

Research on optimizing Steiner tree problems for faster and near-perfect phylogenetic tree reconstruction, focusing on genotype data and evolutionary relationships.

lblakemore
Download Presentation

A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees Srinath Sridhar Joint work with: Kedar Dhamdhere, Guy E. Blelloch, Eran Halperin, R. Ravi and Russell Schwartz

  2. Steiner Tree Problem • Input: Graph G(V, E) with edge weights w: ER and a ‘terminal’ set SV • Output: Subtree T of G connecting all vertices in S • Objective: Minimize |w(T)| • Informally: MST with intermediate vertices • NP-complete, even if G is m-dimensional hypercube with unit edge weights

  3. Near-Perfect Phylogenetic Trees • Input: set S of n points on an m-dimensional hypercube (n bit-strings of length m) • Output: Steiner (unrooted) tree T connecting S using intermediate nodes (Steiner nodes) of hypercube • Objective: Minimize |T| • Assumption: |Topt| m + q, constant q

  4. Why is this important? (Foster et al., 98)

  5. Why is this important? (Wirth et al., 04)

  6. Typical Input Data • Rows: different species, languages etc • Columns: yes/no, 0/1 properties of rows • Phenotypes: Each column can represent binary questions: thumbs? color-blind? • DNA: Each position has 2 possibilities (almost always)

  7. Example 0001 Boggart H W RS B/NB • Basilisk: 1 1 0 0 • Boggart: 0 0 0 1 • Centaur: 1 0 1 1 • Goblin: 1 0 0 1 H: Head W: Wings RS: Can read stars B/NB: Bad/not-so-bad 1001Goblin 1000 Steiner 1011 Centaur Basilisk 1100

  8. Perfectness 0001 Boggart 1 1001Goblin • Annotate tree T with the column flip • Tree T ‘perfect’: annotations occur only once • Evolution is assumed to be (nearly) perfect 4 3 1000 Steiner 1011 Centaur 2 Basilisk 1100

  9. Perfectness 0001 Boggart 1 1001Goblin • Annotate tree T with the column flip • Tree T ‘perfect’: annotations occur only once • Evolution is assumed to be (nearly) perfect • q-near-perfect: |Topt| m + q, constant q 4 3 1000 Dementor 1011 Centaur 2 Basilisk 1100 1-near perfect 4 Hippogriff 1101

  10. General Phylogeny Problem • Input S: set of n strings in {1, …, k}m • Output: Steiner tree T connecting all of S (Hamming distance) • Objective: Minimize |T| • Variants: • k is bounded by a constant, k is 2 • Tree T is perfect • Tree T is near-perfect

  11. Some Prior Work

  12. Overview Discover O(q) edges, induced topology Optimal Tree

  13. Overview Discover assignment of rows to super nodes Optimal Tree

  14. Overview Grow perfect phylogeny within Each super node Optimal Tree

  15. Overview Link the super nodes Optimal Tree

  16. Current/Future Work • Simpler algorithm • States k > 2, near-perfect • Experimental evaluation, useable code • Related harder problem: Input is ‘mixture’ of 2 strings over {0, 1}mInput:2 0 1 1 2Output:1 0 1 1 0 0 0 1 1 1

More Related