220 likes | 527 Views
Clustal W and Clustal X version 2.0. 김영호 , 박준호 , 최현희 The 9 th Protein Folding Winter School. The Paper. Abstract. The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++
E N D
Clustal W and Clustal X version 2.0 김영호, 박준호, 최현희 The 9th Protein Folding Winter School
Abstract • The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++ • This will facilitate the further development of the alignment algorithms in the future • This has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems
Contents 1 Introduction 2 Clustal W 2.0 and Clustal X 2.0 3 New Features 4 Related Sources
Introduction • One of the oldest and most widely used • First distributed by post on floppy disks (late 1980s, witten in Microsoft Fortran for MS-DOS) • Clustal 1 ~ Clustal 4 (1988, 1989, IBM compatible PCs) • Clustal V (1992, VAX/VMS, Unix, Apple Macintosh, IBM compatible PCs)
Introduction • Clustal W and Clustal X(late 1990s) • Other powerful tools • BAliBASE • T-Coffee • MAFFT • MUSCLE • Yet, Clustal W and Clustal X continue to be very widely used. (EBI Clustal site gets millions of multiple alignment jobs per yr)
Introduction • Clustal W and Clustal X • W : Command terminal • X : Graphic • Procedure • Sequence input (choose a chain or domain from each FASTA sequence) • Concatenate all the query sequences in one file • Run • Output (score, alignment)
Clustal W 2.0 and Clustal X 2.0 • What’s new? • Rewritten in C++ • Easier to maintain the code • Easier to modify, replace some of the alignment algorithms. • UPGMA guide trees • Alternative to the NJ guide trees • Speeds up the alignment of large data sets • Iterative alignment facility • Increase alignment accuracy
Clustal W 2.0 and Clustal X 2.0 • Clustal X • Developed using NCBI’s vibrant toolbox • The vibrant toolbox is no longer supported • Clustal X 2.0 • Rewritten using the Qt GUI toolbox • Qt GUI toolbox provides a native look and feel on Windows, Linux and Mac platforms`
New Features • UPGMA • Faster than NJ (takes less than a minute to cluster 10,000 sequences while NJ takes over an hour) • Slightly less accurate than BAliBASE benchmark, but on large alignments this is offset by the savings in processing time (2h vs. 12h)
New Features • Iteration • A quick and effective method of refining alignments. • ‘Remove first’ iteration scheme • WSP (Weighted Sum of Pairs) • During each iteration step, each sequence is removed form the alignment in turn and realigned. If the WSP score is reduced then the resulting alignment is retained.
New Features • Command line option • ‘-clustering=UPGMA’ • Calls algorithm for UPGMA • ‘-iteration=alignment’ • Refines the final alignment • Less accurate but faster • ‘-iteration=tree’ • Refines at each step in the progressive alignment • More accurate but slower • ‘-numiters’ • Sets iteration cycles (default: 3)
Related Sources • EBI Website • European Bioinformatics Institute website • Supports several alignment programs • We can try various programs (Eg. ClustalW, MAFFT, T-coffee, MUSCLE etc.)
Related Sources • Clustal (web)
Related Sources • Clustal (dos)
Related Sources • Clustal (dos)
Related Sources • MUSCLE
Related Sources • T-Coffee
Related Sources • MAFFT
Related Sources • Kalign