140 likes | 298 Views
MCB3895-004 Lecture # 11 Sept 30/14. De novo genome assembly using Velvet; using Perl to execute system commands. Velvet. Velvet was one of the first widely used de novo assembly programs http ://www.ebi.ac.uk/~ zerbino/velvet/Manual.pdf
E N D
MCB3895-004 Lecture #11Sept 30/14 De novo genome assembly using Velvet; using Perl to execute system commands
Velvet • Velvet was one of the first widely used de novo assembly programs • http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf • It is still quite common, although frequently superseded
Velvet step #1: De brujin graph construction Red bases = sequencing error http://en.wikipedia.org/wiki/Velvet_assembler
Velvet step #2: graph simplification • Unbranched paths collapsed into single nodes http://en.wikipedia.org/wiki/Velvet_assembler
Velvet step #3: tip removal • Dead end paths removed http://en.wikipedia.org/wiki/Velvet_assembler
Velvet step #4: bubble popping • "Bubbles" caused by sequencing error resolved http://en.wikipedia.org/wiki/Velvet_assembler
Running velvet: single end • Step #1: velveth to set up database $ velvethSRR826450_1 21 -short -fastqSRR826450_1.fastq • Parameters for directory name, kmer, read type, input file name • Step #2: velvetg to make graph $ velvetg SRR826450_1 -exp_covauto • Parameters for directory name, automatic coverage detection • Note: the max kmer size is set during software compilation. In our case the max is kmer=61 • Note: only odd kmer numbers are allowed
Running velvet: paired end • In a fit of cussedness, velvet requires its paired-end data in a single interleaved file • Use shuffleSequences_fastq.pl • Copy from /opt/bioinformatics2/velvet_1.2.10_31kmer/contrib/shuffleSequences_fasta/ into your directory • Running velvet is otherwise similar $ velveth SRR826450 21 -shortPaired-fastqSRR826450_paired.fastq $ velvetgSRR826450 -exp_covauto
Velvet output • contigs.fa - contains output contigs • If paired end, contigs are joined into scaffolds using strings of Ns • Log - contains what commands have actually been run along with parameters, results summary
Today's assignment part 1 • Compare a single-end and paired-end assembly of SRR826450 • Use a small kmer (e.g., 21) to keep things computationally simple for now
Running commands using Perl • Perl can be used to run terminal commands, just like Bash • Use the systemcommand to run whatever follows as a terminal command • e.g., system"ls -l > list"; # lists all of the files in the # directory where the script was # run and stores this information # in a new file "list"
Using system to run repetitive analyses • A useful way to use this sort of command is in a loop to run the same program multiple times using slightly different parameters • e.g., for (my$kmer= 15; $kmer>= 31; $kmer=$kmer+ 2){ print"kmer=$kmer\n"; system"velvethinfile_$kmer $kmer-shortPaired-fastqinfile"; system"velvetginfile_$kmer-exp_cov auto"; }
Hints for system loops • Use print statements can be helpful to keep track of loop • Run the loop first with all system commands converted to print statements • Run only a single iteration of the entire loop
Today's assignment part 2 • Answer the question: what is the optimal kmer size to assemble the SRR826450 paired end reads? • For Thursday's class, read and be ready to discuss: • Magoc et al. 2013 Bioinformatics 29:1718-1725 • Bradnam et al. GigaScience 2:10