120 likes | 220 Views
Demo: Phylip. http://evolution.genetics.washington.edu/phylip.html Ziheng Yang Department of Biology, UCL. Phylip: strengths. C program Freely available and runs on all major platforms Lots of people around who know how to use it Runs can be automated by using redirection and command lines
E N D
Demo: Phylip http://evolution.genetics.washington.edu/phylip.html Ziheng Yang Department of Biology, UCL
Phylip: strengths • C program • Freely available and runs on all major platforms • Lots of people around who know how to use it • Runs can be automated by using redirection and command lines • Support for phylip format files by other programs such as clustal, treeview etc. • Easy and transparent interface: each program does one simple job • Popular everywhere including China & Russia where cash is in short supply.
Phylip: “weaknesses” • Easy and simple interface (no mice and menus); renaming files can be tedious. • Parsimony not so good as PAUP* • Do not automatically estimate substitution parameters (universal ts/tv rate ratio) • Some models or options are not available. • Don’t read NEXUS standard files. • 10 characters in sequence name
Common features infile intree weights categories fontfile Phylip programs outfile outtree plotfile These are default file names. If the input files do not exist, you will be asked for the file name. If the output files exist, you will be asked to confirm overwriting them.
Major programs • dnadist: DNA alignment distance matrix • protdist: protein alignment distance matrix • neighbor: distance matrix NJ tree • dnaml: DNA alignment ML tree • dnamlk: DNA alignment ML tree under clock • proml: protein alignment ML tree • dnapars: DNA alignment parsimony tree • protpars: protein alignment parsimony tree • seqboot: DNA alignment bootstrap datasets • consense: summarizes bootstrap results
Sequence file format (Interleaved) 9 1141 chimpanzee ATGACCCCGA CACGCAAAAT TAACCCACTA ATAAAATTAA TTAATCACTC bonobo ATGACCCCAA CACGCAAAAT CAACCCACTA ATAAAATTAA TTAATCACTC human ATGACCCCAA TACGCAAAAT TAACCCCCTA ATAAAATTAA TTAACCGCTC gorilla ATGACCCCTA TACGCAAAAC TAACCCACTA GCAAAACTAA TTAACCACTC bornean ATGACCCCAA TACGCAAAAC CAACCCACTA ATAAAATTAA TTAACCACTC sumatran ATGACCTCAA CACGTAAAAC CAACCCACTA ATAAAATTAA TCAACCACTC gibbon ATGACCCCCC TGCGCAAAAC TAACCCACTA ATAAAACTAA TCAACCACTC horse ATGACAAACA TCCGGAAATC TCACCCACTA ATTAAAATCA TCAATCACTC donkey ATGACAAACA TCCGAAAATC CCACCCGCTA ATTAAAATCA TCAATCACTC ATTTATCGAC CTCCCCACCC CATCCAACAT TTCCGCATGA TGGAACTTCG ATTTATCGAC CTCCCCACCC CATCCAATAT TTCCACATGA TGAAACTTCG ATTCATCGAC CTCCCCACCC CATCCAACAT CTCCGCATGA TGAAACTTCG ATTCATTGAC CTCCCTACCC CGTCCAACAT CTCCACATGA TGAAACTTCG ACTCATCGAC CTCCCCACCC CATCAAACAT CTCTGCATGA TGGAACTTCG ACTTATCGAC CTCCCCACCC CATCAAACAT CTCCGCATGA TGGAACTTCG ACTTATCGAC CTTCCAGCCC CATCCAACAT TTCTATATGA TGAAACTTTG TTTTATTGAC CTACCAGCCC CCTCAAACAT TTCATCATGA TGAAACTTCG TTTTATCGAC CTGCCAACCC CCTCAAACAT TTCATCATGA TGAAACTTTG
Sequence file format (sequential) 5 285 human VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYRLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH goat_cow VLSAADKSNVKAAWGKVGGNAGAYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGEKVAAALTKAVGHLDDLPGTLSDLSDLHAHKLRVDPVNFKLLSHSLLVTLACHLPNDFTPAVHASLDKFLANVSTVLTSKYRLTAEEKAAVTAFWGKVKVDEVGGEALGRLLVVYPWTQRFFESFGDLSTADAVMNNPKVKAHGKKVLDSFSNGMKHLDDLKGTFAALSELHCDKLHVDPENFKLLGNVLVVVLARNFGKEFTPVLQADFQKVVAGVANALAHRYH rabbit VLSPADKTNIKTAWEKIGSHGGEYGAEAVERMFLGFPTTKTYFPHFDFTHGSEQIKAHGKKVSEALTKAVGHLDDLPGALSTLSDLHAHKLRVDPVNFKLLSHCLLVTLANHHPSEFTPAVHASLDKFLANVSTVLTSKYRLSSEEKSAVTALWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSSANAVMNNPKVKAHGKKVLAAFSEGLSHLDNLKGTFAKLSELHCDKLHVDPENFRLLGNVLVIVLSHHFGKEFTPQVQAAYQKVVAGVANALAHKYH ...
Common data-file problems • Input data files are plain text files. Use type (cat) or more (more) to confirm them. • Sequence name must be 10 characters. Add spaces to separate name from sequence. Note that a Tab is different from either one or many spaces. Note the difference between “invisible” spaces and nothing and beware of your editor. If you have the name human on one line, make sure it has at least 5 trailing spaces. • Line feed is known to cause problems, especially when files are transferred among platforms or over the network. Try re-saving the file from a program. Sequence data files are by default corrupted if sent by email. Send zip or gz files.
Windows annoyances • Turn on file extension. In Windows Explorer: “Tools - Folder options – View”: untick "Hide extensions for known file types“. • Try to run jobs from the command line rather than double-clicking from Windows Explorer. • Use Task Manager to run your large jobs at lower priority (nice and renice on unix). If you set the process cmd to low, all jobs started from that window will run at low priority. Resist the temptation of running a big job on your friend’s machine as otherwise you will lose her.
A parsimony analysis (dnapars) del rm copy cp move mv set p set path=d:\soft\phylip\;%PATH% set p copy cytb.phy infile dnapars move outfile cytb.mp.o del infile out* dnapars move outfile cytb.mp.o
Example files http://abacus.gene.ucl.ac.uk/ziheng/teach/cytb.txt http://abacus.gene.ucl.ac.uk/ziheng/teach/abglobin.aa http://abacus.gene.ucl.ac.uk/ziheng/teach/testMB.nex http://abacus.gene.ucl.ac.uk/ziheng/teach/adh.nex