220 likes | 365 Views
MCB3895-004 Lecture #3 Sept 2/14. Intro to UNIX terminal. Introduction to UNIX. Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS) Very little bioinformatics software runs on Windows
E N D
MCB3895-004 Lecture #3Sept 2/14 Intro to UNIX terminal
Introduction to UNIX • Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS) • Very little bioinformatics software runs on Windows • Bioinformatics is very strongly tied to the open-source software movement • Lots of help available on-line • Most programs are free • Windows is not very open-source friendly
Windows users: • Option 1: Do all of your work connected to the Biotechnology Cluster server. Download sshclient(ftp://ftp.uconn.edu/restricted/ssh/) • Option 2: Install LINUX to run in parallel with Windows (e.g., Biolinuxhttp://nebc.nerc.ac.uk/tools/bio-linux)
Terminal • The terminal is the primary way to do computational biology • Mac: Utilities/Applications/ Terminal • Linux: Applications/Accessories/ Terminal • Windows: sshclient
Assignment • A handy resource to learn the basics of UNIX is the “Unix and Perl Primer for Biologists”, which can be found here: http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf • The commands they demonstrate mainly involve creating, removing and moving around files and directories • Once you learn them, these commands will take you far beyond what you can do with a more familiar GUI like Mac Finder or Windows Explorer
Worthy of special comment • Directory trees • Using tab to autocomplete • Wildcard characters like * to perform the same operation to multiple files (this is insanely useful once you get the hang of it!) • Using nano as a very basic text editor Never, ever, ever use Word for this! • Use underscores “_” not spaces in your filenames
Directory trees • All computer files are organized hierarchically • Each folder has an address /Users/Jonathan/ Laptop_backup/Destop/ e-Books
A quick reference to where you are in UNIX • “/” - root • “~” - your user home directory • “.” - “here”, the directory you are in now • “../” - one level up in the directory tree
More UNIX tricks • “>” (greater than) redirects the output of a command into a new file e.g., ls * > list • a list of the files in this directory is now stored in the file “list”
More UNIX tricks • cat joins multiple files together e.g.,cat file1 file2 > file3 • file3 contains file1 and file2 joined together • file1 and file2 still exist as they were
More UNIX tricks • grepextracts all lines containing a particular pattern from a file e.g.,grep “NP_” file1 • Prints every line that contains the pattern “NP_” to the screen
More UNIX tricks • wccounts the newlines, words and bytes in a file e.g.,wc file1 • Prints an output like this: 10602 18921 752002 file1 newlines words bytes filename
More UNIX tricks • “|” (pipe) directs the output of one command into another e.g.,grep “NP_” file1 | wc • Sounds the output of the grep command into wc, because grep extracts lines from a file, can be used to count the number of lines matching the grep expression e.g., grep “NP_” file1 | less • Displays grep result as a list you can scroll through
More UNIX tricks • gzip/gunzip: single file compression e.g., gunzip file.txt.gz • Decompresses file.txt e.g., gzip file.txt • Creates compressed file file.txt.gz, removes file.txt
More UNIX tricks • tar: file archive management e.g., tar -cf all.tar * • Creates tar archive all.tar containing all files in that directory, individual files unchanged e.g., tar -xf all.tar • Extracts all files from tar archive all.tar to the current directory, all.tar not deleted • tar is very commonly used before gzip - “tarballs”
Connecting to the Bioinformatics facility server • UNIX command ssh • e.g., ssh -l jlklassen bbcsrv3.biotech.uconn.edu • Will ask for a password • If the first time connecting, will want you to authenticate an RSA key (security feature) • Your terminal now controls the bioinformatics facility server, not your own machine • You can have multiple terminals open at the same time
Transferring files to the Bioinformatics facility server • Method 1: Filezilla(https://filezilla-project.org/) • Nice GUI • Works on all platforms • Install the client, not the server
Transferring files to the Bioinformatics facility server • Method 2: UNIX command scp • e.g., scp jlklassen@bbcsrv3.biotech.uconn.edu:all.tar all.tar • Copy all.tar from my computer to the biotech server • e.g., scp -r jlklassen@bbcsrv3.biotech.uconn.edu:dir/ . • Copy the directory “dir” from the biotech server to the current working directory • “-r” flag indicates “recursive”, needed for directories
Text editors • Using nano works, but can be cumbersome for complex tasks • Word is always bad! Adds layers you don’t see. • Mac and LINUX have TextEdit and Gedit as default text editors, both work well • Windows: Notepad and Wordpad are insufficient. I suggest downloading Geditfor Windows (https://wiki.gnome.org/Apps/Gedit) • Other options exist for all platforms
Assignment • See instructions posted on the website at http://wp.mcb3895.mcb.uconn.edu • Part 1: work through Korf manual sections U1-U27 (some commands require external files, ignore these but understand what they do) • Part 2: log on to the Biotech server, download a genome from NCBI and answer the questions given • The assignment is due at the start of class 1 week from today
Command line power! • The simplest way to download these data is to use the terminal command wget $ wget –r --no-directories --retr-symlinks-P Acaricomes_phytoseiuli/ ftp://ftp.ncbi.nlm.gov/genomes/refseq/bacteria/Acaricomes_phytoseiuli/latest_assembly_versions/GCF_000376245.1_ASM37624v1/ • Deconstructed: • -r – “recursive”, i.e., download everything in this directory • --no-directories– does not create the entire ftp directory structure • --retr-symlinks– NCBI uses a fancy file structure using something called “symbolic links”, where a file points to another file somewhere else. “--retr-symlinks” gets the actual files, not just the links • -P Acaricomes_phytoseuili/ – where to put the output