1 / 22

GCG vs EMBOSS

GCG vs EMBOSS. Gary Williams. Which is better GCG or EMBOSS?. You must decide for yourselves You may find other packages that do what you want Use the tools that do the job This is a comparison of GCG and EMBOSS to help you decide. Interfaces. Web W2H available for both

axelle
Download Presentation

GCG vs EMBOSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GCG vs EMBOSS Gary Williams

  2. Which is better GCG or EMBOSS? • You must decide for yourselves • You may find other packages that do what you want • Use the tools that do the job • This is a comparison of GCG and EMBOSS to help you decide

  3. Interfaces • Web • W2H available for both • EMBOSS W2H still has rough edges • PISE • Others under development • X-Windows • GCG - Seqlab • EMBOSS - SPIN, (+ others coming) • Telnet/xterm/Character-based • emnu

  4. Command line is very similar • The UNIX command line interfaces of GCG and EMBOSS are very similar. • You type the name of the program • You can add any options you want to the command-line • Press the RETURN key • Any mandatory information that was not on the command-line will be prompted for.

  5. GCG command-line % name -other=thing This is the name program that reads a sequence and writes out something. NAME what sequence ? embl:hsfau1 Begin (* 1 *) ? End (* 2016 *) ? Reverse (* No *) ? What should I call the output (* hsfau.name *) ?

  6. EMBOSS command-line % name -other thing Reads in sequences and writes a thing Input sequence(s): embl:hsfau1 Output data [hsfau1.name]: • Use ‘-ask’ to make EMBOSS programs prompt for the start and end of sequences

  7. Some common options • Running in scripts, don’t prompt, just fail if command-line is insufficient • GCG: -default • EMBOSS: -auto • Help on options • GCG: -check • EMBOSS: -help or -help -verbose • Boolean options (Yes/No, True/False) • GCG: -thing, -nothing • EMBOSS: -thing, -nothing, -thing=T, -thing=F, -thing=1, -thing=0, -thing=Y, -thing=N

  8. Sequence options in EMBOSS "-sequence" related qualifiers -sbegin integer first base used -send integer last base used, def=seq length -sreverse bool reverse (if DNA) -sask bool ask for begin/end/reverse -slower bool make lower case -supper bool make upper case -sformat string input sequence format -ufo string UFO features

  9. Sequence options in EMBOSS "-outseq" related qualifiers -osformat string output sequence format -ossingle bool separate file for each entry

  10. EMBOSS general options -debug bool write debug output to program.dbg -auto bool turn off prompts -stdout bool write standard output -filter bool read standard input, write standard output -options bool prompt for required and optional values -verbose bool report some/full command line options -help bool report command line options

  11. Data files • GCG uses ‘..’ to divide comments from data • EMBOSS does not use ‘..’ • In general, EMBOSS uses ‘#’ to mark a comment line • Use ‘embossdata’ to extract and check on data files. • As in GCG, data files copied into the current or home directory are used in preference to the originals.

  12. List files (files of file names) • Similar to GCG lists files, but no ‘..’ • Comment lines start with ‘#’ • Can contain the names of other list files: # This is my list file embl:hsfau embl:ggg* myfile.seq:clone10 file.seq @list2

  13. File formats • GCG • only GCG format, MSF and RSF • EMBOSS • many formats • automatically recognised • can specify using ‘::’ or ‘-osf’ • eg: clustal::globin.aln -osf gcg

  14. One file, many sequences • GCG • Only one sequence per GCG file • EMBOSS • One or more sequences per file • Default is to write all sequences to one file • -ossingle will change to writing many files • GCG, Staden and plain format files can only hold one sequence per file.

  15. Features • GCG • No concept of feature tables • EMBOSS • Many programs now write out results as GFF • Soon, all programs that find things will write the results as GFF • GFF will become another sequence format • Programs to manipulate and display sets of features are planned • c.f. showfeat, coderet, maskfeat, diffseq

  16. Databases • EMBOSS is poor at grouping many databases under one name • E.G. Need a way of referring to ‘embl’ and ‘emblnew’ as one database. • This will be done, but currently, a list file containing the following seems best: embl:* emblnew:*

  17. Command line wildcards • GCG: • embl:* - no problem • EMBOSS: • embl:* - UNIX complains it can’t find the files • solution is to quote it: • “embl:*” • or: • embl:\*

  18. HELP • GCG: • genman, genhelp • EMBOSS • tfm

  19. What program does what? • See David Martin’s list of equivalences: http://www.no.embnet.org/Programs/SAL/EMBOSS/fromGCG.php3 • NB this doesn’t list EMBOSS programs with no equivalent in GCG!

  20. What EMBOSS does NOT do • The major deficiencies in the EMBOSS package are: • BLAST, FASTA, ASSEMBLY • You should use the publicly available software: • Blast - NCBI, HGMP, many other sites • Fasta - HGMP • Assembly - Staden package

  21. What EMBOSS does do • Giving ‘stdout’ as the output file name makes output go to the screen. • Much effort is put into removing arbitrary limits. • E.g. Max. sequence length: 2Gb • Many programs limited only by available memory • Source code available for inspection, change and writing your own programs • EMBOSS is FREE! • GNU Public Licence • Open Source Software

  22. THE END

More Related