50 likes | 67 Views
This project, titled "Genes to Trees," focuses on reconstructing phylogenetic trees by collecting GenBank data, performing phylogenetic analysis using tools like PAUP, MrBayes, and GARLI, and curating data through multiple sequence alignment techniques like ClustalW, Muscle, and MAFFT. The workflow involves user inputs of sequences and taxonomic constraints, eliminating smaller groups, creating a super-matrix, and conducting phylogenetic analysis to generate trees of closely related organisms. Feasibility lies in scripting using Perl, leveraging BioPerl libraries, accessing sequence data, manipulating alignments, and enhancing bioinformatics programming capabilities. The relevance of this project includes facilitating further analyses, running multiple parallel analyses, employing a modular workflow, and advancing robust high-throughput phylogenetics.
E N D
Genes to Trees Daniel Ayres and Adam Bazinet CMSC858P - Project 2 Proposal
Phylogenetic tree reconstruction “Genes to Trees” GenBank Data collection Phylogenetic analysis (PAUP, MrBayes, GARLI) Data curation Multiple sequence alignment (ClustalW, Muscle, MAFFT) Visual inspection and post-processing
How does it work? • User inputs: • Set of DNA or amino acid sequences • Taxonomic constraints • Homologous sequences obtained from GenBank • Smaller groups eliminated • Multiple alignment of each group made • Uninformative columns removed • “Super-matrix” of all sequences created • Phylogenetics analysis performed • Output: • Phylogenetic tree of closely related organisms Workflow
Is it feasible? • Scripting will be done with Perl • Extensive use of BioPerl libraries • Collection of modules for bioinformatics programming • Accessing sequence data from local and remote databases • Manipulating individual sequences • Searching for similar sequences • Creating and manipulating sequence alignments
Why is this relevant? • Results can serve as a starting point for further analysis • Multiple analyses can be run in parallel • Workflow is modular • A step towards robust, high-throughput phylogenetics