1 / 31

Unix Text Editing and Simple Programming

Text Files. Most bioinformatics work involves messing around with text files. DNA and protein sequences, genotypes, databases, results of similarity searches and multiple alignments are all stored on the computer as ordinary ASCII text files. To read, write, and edit these text files you must get

velika
Download Presentation

Unix Text Editing and Simple Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Unix Text Editing and Simple Programming

    2. Text Files Most bioinformatics work involves messing around with text files. DNA and protein sequences, genotypes, databases, results of similarity searches and multiple alignments are all stored on the computer as ordinary ASCII text files. To read, write, and edit these text files you must get familiar with a Text Editor program

    3. What is a Text Editor? A text editor is like a word processor on a personal computer, except that it does not apply formatting styles (bold, italics, different fonts etc.). Unix has line editors (view and edit one line at a time) and full screen editors. A screen editor loads an entire document into a buffer - allows you to jump to any point in the document.

    4. Unix Text Editors There are many different text editors available for Unix computers. You can have multiple editors on one system vi - old, reliable, present on every Unix machine, completely and utterly user hostile jed - fairly simple, identical to eve on the old VMS system pico - extremely simple, perhaps too simple emacs - a compromise between power features and ease of use

    5. Emacs The full name of the Emacs program is: "GNU emacs, the Extensible, Customizable, Self-Documenting, Real-time Display Editor.” Emacs is free software produced by the Free Software Foundation (Boston, MA) and distributed under the GNU General Public License. Open source software - Linux GNU is an acronym for: “GNU is Not Unix”

    6. Starting emacs To start Emacs, at the > command prompt, just type: emacs To use Emacs to edit a file, type: emacs filename (where filename is the name of your file) When Emacs is launched, it opens either a blank text window or a window containing the text of an existing file.

    8. The Emacs Display The display in Emacs is divided into three basic areas. The top area is called the text window. The text window takes up most of the screen, and is where the document being edited appears. Below the text window, there is a single mode line (in reverse type). The mode line gives information about the document, and about the Emacs session. The bottom line of the Emacs display is called the minibuffer. The minibuffer holds space for commands that you give to Emacs, and displays status information.

    9. Emacs Commands Emacs uses Control and Escape characters to distinguish editor commands from text to be inserted in the buffer. Ctrl -x means to hold down the control key, and type the letter x. (You don't need to capitalize the x, or any other control character) [ESC] x means to press the escape key down, release it, and then type x.

    10. Save & Exit To save a file as you are working on it, type: Ctrl-x ť Ctrl-s To exit emacs and return to the Unix shell, type: Ctrl -x ť Ctrl -c If you have made any changes to the file, Emacs will ask you if you want to save: Save file /u/browns02/nrdc.msf? (y,n,!,.,q,C-r or C-h) Type “y” to save your changes and exit If you type “n”, then it will ask again: Modified buffers exist; exit anyway? (yes or no) If you answer “no”, then it will return you to the file, you must answer “yes” to exit without saving changes

    11. Moving Around The arrow keys on the keyboard work for moving around one line or one character at a time. Some navigation commands: Move to the Top of the file: [Esc] < Move to the End of the file: [Esc] > Next screen (page down): Ctrl-v Previous screen (page up): [Esc] v Start of the current line: Ctrl-a End of the current line: Ctrl-e Forward one word: [Esc] f Backward one word: [Esc] b

    12. Type Text Once you move the cursor to the location in the file where you want to do some editing, you can just start typing - just like in an ordinary word processor. The delete key should work to remove characters and inserted text will push existing text over.

    13. Cut, Copy, and Paste You can delete or move blocks of text. First move the cursor to the beginning (or end) of the block of text. Then set a mark with: Ctrl-spacebar Now move to the other end of the block of text and Delete or Copy the block: Delete: Ctrl-w Copy: [Esc] w To Paste a copied block, move to the new location and insert with : Ctrl-y

    14. Getting Help in Emacs Emacs has a built in help feature Just type: Ctrl-h To get help with a specific command, type: Ctrl-h k keys (where “keys” are the command keys that you type for that command) Emacs has a built in tutorial: Ctrl-h t this will be an exercise for this week’s computer lab.

    15. Emacs Help on the Web Getting Started with Emacs http://www.cs.ucl.ac.uk/teaching/supportdocs/emacs.htm by Johnathon Poole,University College London, Dept. of Computer Science LinuxCentral: Emacs Beginner's HOWTO http://linuxcentral.com/linux/LDP/HOWTO/Emacs-Beginner-HOWTO.html The official GNU Emacs Manual http://www.gnu.org/manual/emacs/html_chapter/emacs_toc.html Getting Started With the Emacs Screen Editor http://www.leeds.ac.uk/iss/documentation/beg/beg6.pdf

    16. Simple Programs You can use the Unix shell to run simple programs right from the command line. Use a for loop to run a program on a bunch of files grep lets you look for certain words in the output files An if statement allows the program to make decisions: Repeat if true Sort if e-value is greater than 0.01

    17. for loop We will use the "for" command in the bash shell for an exercise today. [There are lots of other ways to do this, but I happen to know it this way.] > for i (*.fasta) do water -asequence=$i -bsequence=testseq.fas -auto; done This will make a pairwise SmithWaterman search with all files in the current directory that have a filename ending in .fasta (remember - logical filenames are important)

    18. grep grep is a tool that finds a keyword in a file We can use it to quickly find sequences that have no matches in a database similarity search > grep -F -l 'No sequences found' *.fasta UNIX can be short and sweet when you know what you are doing!

    19. if If is a tool that makes decisons We can use it to sort results from some type of search – similarity, pattern match, etc. if $eval < 0.05; then echo $seqname > goodmatch.txt; else echo $seqname > nomatch.txt; fi UNIX can be short and sweet when you know what you are doing!

    20. until, while Until and while run a loop and make an ‘if’ decision $x = 1; until $x > 4; do einverted my$x.seq; $x = $x+1 done

    21. Use a Script A Script is a set of Unix commands saved as a text file so it can be used over and over again You can run any EMBOSSprogram in a script This is especially good for connecting up several programs in a pipeline use the results of one program is input for the next sort the outputs and create a summary file

    22. Make a Script The "for" loop that I just showed was done on the command line Once it is run, it is gone. But, you can put the same lines into a text file, save the file, and then run it as a script whenever you need to do a complex operation on a bunch of files You must change the file permissions to make the text file eXecutable: chmod u+x yourfilename.txt

    23. #!/usr/local/bin/tcsh foreach i (*.seq) fastx $i -exp=0.05 in2=pir:* -default grep -q "No sequences found" $i.fast if (! $status) then echo "No hit for $i.seq" echo $i.seq >> fastx.nohit else grep -q "The best scores are:" $id.fastx if (! $status) then set line = `grep -n "The best scores are:" $id.fastx|cut -f1 -d:` end

    24. Next Step: A Database Once you have scripted a few hundred FASTA or BLAST searches, you will have a bunch of results files (text files) With grep and a few other scripting tricks, you can sort the data and summarize in new text files (parsing)- leading to perhaps an Excel spreadsheet A much more elegant (and scaleable) solution would be to create a database - but that goes beyond what I can teach in this course.

    25. Scripting Languages A number of programming languages have been developed to expand the power of Unix scripts. Perl is particularly favored by biologists object oriented programming regular expressions see book: “Beginning Perl for Bioinformatics” by James Tisdall Other biologists favor Python or Java

    26. Shell or Perl Shell scripts use built in functions of the Unix operating system (grep, cut, sort) – can be very fast Much more power and flexibility in a full programming language such as Perl Rule of thumb – use shell script to save a set of operations that you are comfortable using on the command line (a simple loop, 2 or 3 step pipeline), use Perl for more complex work.

    27. Interact with Web Pages These scripting languages also make it easy to automate the use of a Web page Submit a bunch of sequences Choose options, add information to various fields Extract the results from the html files that are returned

    28. BioPerl Why re-invent the wheel? Lots of common bioinformatics tasks have already been programmed as “modules” in Perl. Grab sequences from GenBank, extract e-values and annotation from Blast results, etc. Download them from www.bioperl.org

    29. Becoming a Unix Power User Learn more Unix commands http://ss64.com/bash/ Use the shell to execute simple programs Write scripts Download and install the latest bioinformatics software Drive your system manager crazy… or get your own Unix machine (Linux on an Intel machine or Mac OS-X)

    30. Resources Notes for Lincoln Stein’s course in “Genome Informatics” http://stein.cshl.org/genome_informatics/index.html BioPerl.org http://bio.perl.org/ PERL for biologists (Kurt Stüber) http://caliban.mpiz-koeln.mpg.de/~stueber/perl/ “Why Biologists Want to Program Computers” by James Tisdall: http://www.oreilly.com/news/perlbio_1001.html

    31. Resources for Bio-Computing

More Related