170 likes | 397 Views
NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools. March 15 th , 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB. Learning Objectives. Linux revisited Quick dive into the Open-Bio pool ( BioPython )
E N D
NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15th, 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB
Learning Objectives • Linux revisited • Quick dive into the Open-Bio pool (BioPython) • A first look at NGS data: • NCBI short read archive • Processing NGS: FASTX tool kit et al. • Visualization: IGV
Files and Permission • Linux user permissions: owner, group, or others • Owner/user is the person who created the file • “OWNS” the file / directory • Group is a team of people that’s associated together • GROUP project / Team work • Others is just other people on the server • Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute
chmod: change file permissions Do a long listing (ls –l) • dr-x-wxrw- Separated into four sections • (d)(r - x)(- w x)(r w -) Examples: chmodo+x foo.txt grant ‘execute’ permission to ‘others’ on foo.txt chmod g-rw foo.txt remove ‘read’ and ‘write’ permission from group chmodugo+rwxfoo.txt grant all rights to everyone To change the user/group (‘owner’) of a file: chmodubuntu:ubuntu foo.txt directory or file (-) user (owner) group others
a few useful tips… • Hitting “tab” will auto-complete file or program names (or suggest possible names) • Up arrow will let you return to previous commands • Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++)
Some more useful basic Linux commands • “cd” changes your directory, e.g. ‘cd /usr/local’ • “man” display manual for command, e.g. ‘man ‘ls’ • “pwd” tells you the directory you are currently in (= working directory) • “history” will list recent commands, enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command
Accessing remote servers • “ssh” – Secure Shell ssh –iprivate_keypairuser@host • “scp” – Secure CoPy ssh–iprivate_keypair[user@host:]sourcefile [user@host:]targetfile Where user is the account (default: local user) and host is the internet name of the computer (defaults: local host)
OpenBio Case Study: BioPython http://biopython.org/wiki/Biopython http://biopython.org/DIST/docs/tutorial/Tutorial.html
NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and Visualization Tools First look at ngs data
http://hannonlab.cshl.edu/fastx_toolkit/ Linux, MacOSX or Unix only
Get the precompiled binary wgethttp://hannonlab.cshl.edu/fastx_toolkit/Ã fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 bunzip2fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 tar –xvffastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar sudomv bin/* /usr/local/bin
FASTX tool kit I • FASTQ-to-FASTA converter • Convert FASTQ files to FASTA files. • FASTQ Information • Chart Quality Statistics and Nucleotide Distribution • FASTQ/A Collapser • Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts) • FASTQ/A Trimmer • Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise). • FASTQ/A Renamer • Renames the sequence identifiers in FASTQ/A file. • FASTQ/A Clipper • Removing sequencing adapters / linkers
FASTX tool kit II • FASTQ/A Reverse-Complement • Producing the Reverse-complement of each sequence in a FASTQ/FASTA file. • FASTQ/A Barcode splitter • Splitting a FASTQ/FASTA files containing multiple samples • FASTA Formatter • Changes the width of sequences line in a FASTA file • FASTA Nucleotide Changer • Converts FASTA sequences from/to RNA/DNA • FASTQ Quality Filter • Filters sequences based on quality • FASTQ Quality Trimmer • Trims (cuts) sequences based on quality • FASTQ Masker • Masks nucleotides with 'N' (or other character) based on quality
www.bioinformatics.bbsrc.ac.uk/projects/download.html http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Integrative Genomics Viewer http://www.broadinstitute.org/igv/