190 likes | 300 Views
Ni mble Perl Programming Using Scriptome. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009. Objectives. Determining whether Scriptome can … Enable you to perform operations otherwise difficult/time-consuming/error-prone?
E N D
Nimble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009
Objectives Determining whether Scriptome can … • Enable you to perform operations otherwise difficult/time-consuming/error-prone? • Help you learn Perl? Also, we’ll be using anonymouspolling to determine whether you’re happy with the material and speed of delivery … And don’t worry: This experiment won’t hurt a bit! 2
So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists • Originally developed by Harvard’s FAS Center for Systems Biology • Maintained and extended by lots more volunteers not associated with Harvard 3
Why Bother With Scriptome? • Code is visible, enabling learning on how to do things in Perl … or not • Can handle arbitrarily large files • No size limitations, e.g., Excel • Free; runs on everything: PC, Mac, Linux • It’s programmatic! • Much faster than manual operations • You can string operations together and save these in e.g. a .bat file 4
How Do You Use Scriptome? • You tell Scriptome which function you want it to perform (more later) • You can also string Scriptome functions into a protocol • Input: Scriptome operates on text files • No binary files, but you could add that capability yourself • E.g., process Excel files in native form using Perl modules, e.g., ParseExcel • Output: command line or write into another file 5
Scriptome: Pick Your Flavor http://lane.stanford.edu/howto/index.html?id=_1257 http://sysbio.harvard.edu/csb/resources/computational/scriptome/ 6
Installing Scriptome - Windows • Download Scriptome_exe.tar.gz using this link: http://sysbio.harvard.edu/csb/resources/computational/scriptome/bin/Scriptome_exe.tar.gz. → Final location: I suggest C:/Program Files/Scriptome • Create a directory named “Scriptome” • Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside • Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat 7
Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example • Scriptome -t change_fasta_to_tab LONGhmcad.fst 2. Finding a tool by type: Scriptome -t tooltype where tooltype = • Calc • Choose • Sort • Fetch • Merge • Change Example • Scriptome -t Calc Let’s examine each area briefly before going over specifics… 8
Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous 9
Calc Tool Examples - 1 Compute column sums: • Scriptome -t calc_col_sum SubjectData1.tab → select columns to add IMPORTANT: column numbers start at 0, not 1 • Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ " file.tab 11
Calc Tool Examples - 2 Compute row sums: • Scriptome -t calc_row_sum SubjectData1.tab → enter 1 for column 1, 2 for column 2, etc perl -e " @cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of columns @cols for each line ($. lines)\n\n~ " in.tab 12
Change Tool Examples - 1 perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_ .= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna Create tab-delimited file from FASTA file: • Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files 13
Change Tool Examples - 2 Change rows to columns or vice versa: • Scriptome -t change_transpose_table SubjectData1.tab • Note: change_transpose_table operates on tab-delimited files 14
Change Tool Examples - 3 • Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installed * Notice anything interesting? * perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile , -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while (<$in>) { print $out $_; $count++; } } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta 15
Conclusions Scriptome is … • A good solution for manipulating medium to large data files quickly and reliably • A way to learn Perl in a “real” context (no toy problems) • Able to perform a wide range of tasks, from simple, generic file manipulations to bio-specific complex tasks 16
Resources • For Perl help, see resources in workshop description in Lane’s Perl Programming for Biologists • Some recommended titles: 17
Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again? 18