Clustal W and Clustal X version 2.0

Clustal W and Clustal X version 2.0 김영호, 박준호, 최현희 The 9th Protein Folding Winter School

The Paper

Abstract • The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++ • This will facilitate the further development of the alignment algorithms in the future • This has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems

Contents 1 Introduction 2 Clustal W 2.0 and Clustal X 2.0 3 New Features 4 Related Sources

Introduction • One of the oldest and most widely used • First distributed by post on floppy disks (late 1980s, witten in Microsoft Fortran for MS-DOS) • Clustal 1 ~ Clustal 4 (1988, 1989, IBM compatible PCs) • Clustal V (1992, VAX/VMS, Unix, Apple Macintosh, IBM compatible PCs)

Introduction • Clustal W and Clustal X(late 1990s) • Other powerful tools • BAliBASE • T-Coffee • MAFFT • MUSCLE • Yet, Clustal W and Clustal X continue to be very widely used. (EBI Clustal site gets millions of multiple alignment jobs per yr)

Introduction • Clustal W and Clustal X • W : Command terminal • X : Graphic • Procedure • Sequence input (choose a chain or domain from each FASTA sequence) • Concatenate all the query sequences in one file • Run • Output (score, alignment)

Clustal W 2.0 and Clustal X 2.0 • What’s new? • Rewritten in C++ • Easier to maintain the code • Easier to modify, replace some of the alignment algorithms. • UPGMA guide trees • Alternative to the NJ guide trees • Speeds up the alignment of large data sets • Iterative alignment facility • Increase alignment accuracy

Clustal W 2.0 and Clustal X 2.0 • Clustal X • Developed using NCBI’s vibrant toolbox • The vibrant toolbox is no longer supported • Clustal X 2.0 • Rewritten using the Qt GUI toolbox • Qt GUI toolbox provides a native look and feel on Windows, Linux and Mac platforms`

New Features • UPGMA • Faster than NJ (takes less than a minute to cluster 10,000 sequences while NJ takes over an hour) • Slightly less accurate than BAliBASE benchmark, but on large alignments this is offset by the savings in processing time (2h vs. 12h)

New Features • Iteration • A quick and effective method of refining alignments. • ‘Remove first’ iteration scheme • WSP (Weighted Sum of Pairs) • During each iteration step, each sequence is removed form the alignment in turn and realigned. If the WSP score is reduced then the resulting alignment is retained.

New Features • Command line option • ‘-clustering=UPGMA’ • Calls algorithm for UPGMA • ‘-iteration=alignment’ • Refines the final alignment • Less accurate but faster • ‘-iteration=tree’ • Refines at each step in the progressive alignment • More accurate but slower • ‘-numiters’ • Sets iteration cycles (default: 3)

Related Sources • EBI Website • European Bioinformatics Institute website • Supports several alignment programs • We can try various programs (Eg. ClustalW, MAFFT, T-coffee, MUSCLE etc.)

Related Sources • Clustal (web)

Related Sources • Clustal (dos)

Related Sources • MUSCLE

Related Sources • T-Coffee

Related Sources • MAFFT

Related Sources • Kalign

Thank You !

Clustal W and Clustal X version 2.0

Clustal W and Clustal X version 2.0

Presentation Transcript

Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL

X-Call Version 2.0 user guide

iTIXI Version 2.0

ITAG, Version 2.0

iTIXI Version 2.0

Clustal Ω for Protein Multiple Sequence Alignment

W ， x W

Version 2.0

Phylogenetic Analyses of Lymphocystis Disease Virus of Fish using Blast and Clustal X

CardioProtector Version 2.0

Leader - version.2.0

Project Version 2.0

Figure S1. Clustal W multiple alignment of SigA, SigB, and SigF sigma factor proteins.

Version 2.0