80 likes | 190 Views
SATCHMO: Simultaneous Alignment and Tree Construction using Hidden Markov mOdels. Edgar, R., and Sjölander, K . , Bioinformatics 2003. SATCHMO algorithm. Input : unaligned sequences, each forming a separate subtree (of a single sequence each)
E N D
SATCHMO: Simultaneous Alignment and Tree Construction using Hidden Markov mOdels Edgar, R., and Sjölander, K., Bioinformatics 2003
SATCHMO algorithm • Input: unaligned sequences, each forming a separate subtree (of a single sequence each) • Initialize: a profile HMM is constructed for each sequence using Dirichlet mixture densities. • Dirichlet mixture densities avoid the problems of small counts • While (#subtrees > 1) { • Use profile-profile scoring to select closest pair to join • Relative entropy between columnar distributions • Align pair to each other, keeping columns fixed within each subtree • Mask columns with many gaps or high positional relative entropy. • Construct a profile HMM for the new masked MSA • Use Dirichlet mixture densities. } • Output: Tree and MSA
SATCHMO performance evaluation • Evaluating the phylogenetic tree accuracy is difficult • Simulation studies are used to evaluate evolutionary tree methods • These rarely attempt to model the effects of duplication and structural and functional changes • We don’t know the evolutionary history of multi-gene families, so benchmark datasets of real protein family phylogenies are not available • However, we can directly assess the alignment accuracy by way of 3D structure • The structural alignment of two proteins is accepted as “ground truth” by the computational structural biology community • We can also assess the functional predictive power of a phylogenetic tree against what is known about the functions of proteins • This approach is not universally accepted
SATCHMO is more robust to extreme structural divergence than other methods
SATCHMO succeeds at alignment of proteins with different overall folds MAFFT SATCHMO