140 likes | 234 Views
Algorithms research. Tandy Warnow UT-Austin. “Algorithms group”. UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck UNM: Moret, Bader, Williams External participants: Mossel (UCB), Huson (Germany), Steel (NZ), and others. Main research foci.
E N D
Algorithms research Tandy Warnow UT-Austin
“Algorithms group” • UT-Austin: Warnow, Hunt • UCB: Rao, Karp, Papadimitriou, Russell, Myers • UCSD: Huelsenbeck • UNM: Moret, Bader, Williams • External participants: Mossel (UCB), Huson (Germany), Steel (NZ), and others
Main research foci • Solving maximum parsimony and maximum likelihood more effectively • “Fast converging methods” • Gene order and content phylogeny • Reticulate evolution • Multiple sequence alignment at the genomic level
GRAPPA (Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms) http://www.cs.unm.edu/~moret/GRAPPA/ • Heuristics for NP-hard optimization problems • Fast polynomial time distance-based methods • Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy • Poster: Jijun Tang
A A D D B B 3 3 Total length = 18 6 C C E F 4 2 Maximum Parsimony on Rearranged Genomes (MPRG) • The leaves are rearranged genomes. • Find the tree that minimizes the total number of rearrangement events
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor)
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor) 2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)
Reticulate Evolution • Group leader: Randy Linder • Software: (1) producing random networks, (2) simulating sequences down networks, (3) performance evaluation of methods (4) inferring reticulate networks • Current reconstruction methods limited to one reticulation event • Poster: Luay Nakhleh
MP/ML heuristics • Disk-Covering Methods (DCMs): Divide-and-conquer strategies that boosting the performance of base methods for MP/ML (Warnow) • Mr Bayes (Huelsenbeck) • New I-DCM3 technique improves upon the Ratchet and TBR • Poster: Usman Roshan (DCM-MP)
Gutell dataset: 854 rRNA sequences Iterative-DCM3 trials find trees of MP score 103210 in 30 hours, whereas ratchet500 trials take 45 hours to find trees of same score
Other planned projects (partial list) • Multiple Sequence Alignment (Myers and Williams) • Steiner Tree algorithms - error bounds and new heuristics (Rao) • MCMC methods (Russell and Huelsenbeck) • Symbolic representation of data (Hunt) • Parallel algorithms (Bader and Williams)
Questions for group • How should we measure performance? • How should we use simulated data? • How should we use real datasets? • How can we study criteria (MP, ML, etc.) as opposed to methods? • Should we sponsor DIMACS-style challenges? • Others? (please bring questions, comments, answers, to the break-out session)