220 likes | 375 Views
GCG vs EMBOSS. Gary Williams. Which is better GCG or EMBOSS?. You must decide for yourselves You may find other packages that do what you want Use the tools that do the job This is a comparison of GCG and EMBOSS to help you decide. Interfaces. Web W2H available for both
E N D
GCG vs EMBOSS Gary Williams
Which is better GCG or EMBOSS? • You must decide for yourselves • You may find other packages that do what you want • Use the tools that do the job • This is a comparison of GCG and EMBOSS to help you decide
Interfaces • Web • W2H available for both • EMBOSS W2H still has rough edges • PISE • Others under development • X-Windows • GCG - Seqlab • EMBOSS - SPIN, (+ others coming) • Telnet/xterm/Character-based • emnu
Command line is very similar • The UNIX command line interfaces of GCG and EMBOSS are very similar. • You type the name of the program • You can add any options you want to the command-line • Press the RETURN key • Any mandatory information that was not on the command-line will be prompted for.
GCG command-line % name -other=thing This is the name program that reads a sequence and writes out something. NAME what sequence ? embl:hsfau1 Begin (* 1 *) ? End (* 2016 *) ? Reverse (* No *) ? What should I call the output (* hsfau.name *) ?
EMBOSS command-line % name -other thing Reads in sequences and writes a thing Input sequence(s): embl:hsfau1 Output data [hsfau1.name]: • Use ‘-ask’ to make EMBOSS programs prompt for the start and end of sequences
Some common options • Running in scripts, don’t prompt, just fail if command-line is insufficient • GCG: -default • EMBOSS: -auto • Help on options • GCG: -check • EMBOSS: -help or -help -verbose • Boolean options (Yes/No, True/False) • GCG: -thing, -nothing • EMBOSS: -thing, -nothing, -thing=T, -thing=F, -thing=1, -thing=0, -thing=Y, -thing=N
Sequence options in EMBOSS "-sequence" related qualifiers -sbegin integer first base used -send integer last base used, def=seq length -sreverse bool reverse (if DNA) -sask bool ask for begin/end/reverse -slower bool make lower case -supper bool make upper case -sformat string input sequence format -ufo string UFO features
Sequence options in EMBOSS "-outseq" related qualifiers -osformat string output sequence format -ossingle bool separate file for each entry
EMBOSS general options -debug bool write debug output to program.dbg -auto bool turn off prompts -stdout bool write standard output -filter bool read standard input, write standard output -options bool prompt for required and optional values -verbose bool report some/full command line options -help bool report command line options
Data files • GCG uses ‘..’ to divide comments from data • EMBOSS does not use ‘..’ • In general, EMBOSS uses ‘#’ to mark a comment line • Use ‘embossdata’ to extract and check on data files. • As in GCG, data files copied into the current or home directory are used in preference to the originals.
List files (files of file names) • Similar to GCG lists files, but no ‘..’ • Comment lines start with ‘#’ • Can contain the names of other list files: # This is my list file embl:hsfau embl:ggg* myfile.seq:clone10 file.seq @list2
File formats • GCG • only GCG format, MSF and RSF • EMBOSS • many formats • automatically recognised • can specify using ‘::’ or ‘-osf’ • eg: clustal::globin.aln -osf gcg
One file, many sequences • GCG • Only one sequence per GCG file • EMBOSS • One or more sequences per file • Default is to write all sequences to one file • -ossingle will change to writing many files • GCG, Staden and plain format files can only hold one sequence per file.
Features • GCG • No concept of feature tables • EMBOSS • Many programs now write out results as GFF • Soon, all programs that find things will write the results as GFF • GFF will become another sequence format • Programs to manipulate and display sets of features are planned • c.f. showfeat, coderet, maskfeat, diffseq
Databases • EMBOSS is poor at grouping many databases under one name • E.G. Need a way of referring to ‘embl’ and ‘emblnew’ as one database. • This will be done, but currently, a list file containing the following seems best: embl:* emblnew:*
Command line wildcards • GCG: • embl:* - no problem • EMBOSS: • embl:* - UNIX complains it can’t find the files • solution is to quote it: • “embl:*” • or: • embl:\*
HELP • GCG: • genman, genhelp • EMBOSS • tfm
What program does what? • See David Martin’s list of equivalences: http://www.no.embnet.org/Programs/SAL/EMBOSS/fromGCG.php3 • NB this doesn’t list EMBOSS programs with no equivalent in GCG!
What EMBOSS does NOT do • The major deficiencies in the EMBOSS package are: • BLAST, FASTA, ASSEMBLY • You should use the publicly available software: • Blast - NCBI, HGMP, many other sites • Fasta - HGMP • Assembly - Staden package
What EMBOSS does do • Giving ‘stdout’ as the output file name makes output go to the screen. • Much effort is put into removing arbitrary limits. • E.g. Max. sequence length: 2Gb • Many programs limited only by available memory • Source code available for inspection, change and writing your own programs • EMBOSS is FREE! • GNU Public Licence • Open Source Software