390 likes | 544 Views
סקר הוראה. בשבועות הקרובים יתקיים סקר ההוראה ( באתר מידע אישי לתלמיד ). Subroutines Revision. Defining a subroutine. sub reverseComplement { # read string # reverseComplement it # return it }. sub SUB_NAME { # Do something ... }. Subroutines Revision. Invoking a subroutine.
E N D
סקר הוראה בשבועות הקרובים יתקיים סקר ההוראה(באתר מידע אישי לתלמיד)
Subroutines Revision Defining a subroutine sub reverseComplement { # read string # reverseComplement it # return it} sub SUB_NAME { # Do something ...}
Subroutines Revision Invoking a subroutine $reversed = reverseComplement("ACGTTA"); $retval = SUB_NAME(ARGS); sub reverseComplement { # read string # reverseComplement it # return it}
Subroutines Revision $reversed Invoking a subroutine $reversed = reverseComplement("ACGTTA"); sub reverseComplement { # read string # reverseComplement it # return it} When a subroutine is invoked, perl begins to run it separately from the main program
Subroutines Revision $reversed Passing arguments: "ACGTTA" $reversed =reverseComplement( ); "ACGTTA" sub reverseComplement { # read string # reverseComplement it # return it} @_ Arguments are passed using the special array @_
Subroutines Revision $reversed Passing arguments: $seq "ACGTTA" $reversed =reverseComplement( ); sub reverseComplement { ($seq) = @_; # reverseComplement it # return it} @_ "ACGTTA" "ACGTTA" We can then read the arguments from @_
Subroutines Revision $reversed $revSeq $seq $seq "ACGTTA" $reversed =reverseComplement( ); sub reverseComplement { my ($seq) = @_; $seq =~ tr/ACGT/TGCA/; my $revSeq = reverse $seq; # return it} @_ "ACGTTA" "ACGTTA" "TGCAAT" "TAACGT" We can now add whatever content we want to the subroutine
Subroutines Revision $reversed Returning return values: $revSeq $seq $seq "ACGTTA" $reversed =reverseComplement( ); sub reverseComplement { my ($seq) = @_; $seq =~ tr/ACGT/TGCA/; my $revSeq = reverse $seq;return $revSeq;} @_ "ACGTTA" "ACGTTA" "TGCAAT" We return values using the word return "TAACGT" "TAACGT"
Subroutines Revision $first $last Another example: $1 $2 $string my ($first, $last) = firstLastChar("Yellow"); @_ sub firstLastChar{ my ($string) = @_; $string =~ m/^(.).*(.)$/; return ($1,$2); } "Yellow" "Yellow" "Y" "w" "Y" "w"
Subroutines Revision Passing references: $ourPets my @ourPets = ('Liko','Emma','Louis'); printPets (\@ourPets); We create a reference to the array sub printPets { my ($petRef) = @_; foreach my $pet (@{$petRef}) { print "Good $pet\n"; }} 'Louis' 'Emma' 'Liko'
Subroutines Revision Passing references: $ourPets my @ourPets = ('Liko','Emma','Louis'); printPets (\@ourPets); @_ sub printPets { my ($petRef) = @_; foreach my $pet (@{$petRef}) { print "Good $pet\n"; }} Then we pass the reference to the subroutine
Subroutines Revision Passing references: $ourPets $petRef my @ourPets = ('Liko','Emma','Louis'); printPets (\@ourPets); @_ sub printPets { my ($petRef) = @_; foreach my $pet (@{$petRef}) { print "Good $pet\n"; }}
Subroutines Revision Passing references: $ourPets $petRef my @ourPets = ('Liko','Emma','Louis'); printPets (\@ourPets); @_ sub printPets { my ($petRef) = @_; foreach my $pet (@{$petRef}) { print "Good $pet\n"; }} Good Liko Good Emma Good Louis
Debugging subroutines Step into a subroutine (F5)to debug the internal work of the sub Step over a subroutine (F6)to skip the whole operation of the sub Step out of a subroutine (F7)when inside a sub – run it all the way to its end and return to the main script Resume (F8)run till end or next break point Step over Step out Step into
What are modules? • A module or a package is a collection of subroutines, stored in a separate file with a “.pm” suffix (Perl Module). • The subroutines of a module should deal with a well-defined task. For example: manipulate fasta files. Fasta.pm sub getHeaders { ...}sub getSeqNo { ...} sub readFasta { ... }
Writing a module • A module is written in a separate file with a “.pm” suffix. • The name of the module is defined by a “package” line at the beginning of the file: Fasta.pm package Fasta; sub getHeaders { # Get all the fasta headers}sub getSeqNo { # Return the number of # sequences in the file}
Writing a module • The last line of the module has to have true value, so we add:1; Fasta.pm package Fasta; sub getHeaders { # Get all the fasta headers}sub getSeqNo { # Return the number of # sequences in the file} 1;
Using modules • In order to use a module, we write the line: use MODULE_NAME;at the beginning of our script. We usethe subroutines that we defined in the module Fasta MyScript.pl Fasta.pm use strict; use Fasta; # Here we will use the subroutines # defined in the Fasta.pm module package Fasta; sub readFasta { ... } sub getSeqNo { ...} 1;
Tips for using modules Both files must be in the same directory. To use files from different directories, read about use lib. We can even use modules inside modules. MyScript.pl MyScript.pl Fasta.pm Fasta.pm We can use as many modules as we want in a single script use strict; use Fasta; # Here we will use the subroutines # defined in the Fasta.pm module use strict; use Fasta; use geneBank; # Here we will use the subroutines # defined in the Fasta.pm module package Fasta; sub readFasta { ... } sub getSeqNo { ...} package Fasta; use geneBank; sub readFasta { ... } sub getSeqNo { ...} 1;
Using modules - namespaces Now we can invoke a subroutine from within the namespace of that module:PACKAGE::SUBROUTINE(ARGUMENTS) MyScript.pl Fasta.pm package Fasta; use geneBank; sub readFasta { ... } sub printFasta { ...} 1; use strict; use Fasta; use geneBank; @seqs = Fasta::readFasta("ec.fasta"); Fasta::printFasta(@seqs);
Using modules - namespaces We can't access the module without specifying the namespace Undefined subroutine &main::getSeqNo called at... MyScript.pl Fasta.pm use strict; use Fasta; use geneBank; @seqs = readFasta("ec.fasta"); printFasta(@seqs) package Fasta; use geneBank; sub readFasta { ... } sub printFasta { ...} 1;
Class exercise 12a • a) Write a module Fasta.pmthat contains a subroutine headers( ): receive a Fasta filename and returns a reference to an arraycontaining all the headers. Write a script that test your module on EHD_nucleotide.richFasta. b) Add to the module the following subroutine:seqLengths( ): receive a Fasta filename and returns a reference to an arraycontaining all the sequences lengths. Test this subroutine as well. • (Home ex6 q3) Create a module readSeq.pm with the following functions: • a) readFastaSeq( ): Reads sequences from a FASTA file. Return a hash – the header lines are the keys and the sequences are the values (similar to ex5.1b). • b) readGenbak( ): Reads a genome annotations (such as adeno12.gbs) file and extract CDS information (similar to class_ex. 10b.2a) as follows:$genes{$name}{"protein_id"} = PROTEIN_ID $genes{$name}{"strand"} = STRAND $genes{$name}{"CDS"} = [START, END]Return a reference to the complex data structure. Test the module with an appropriate script!
BioPerl • The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research. • Things you can do with BioPerl: • Read and write sequence files of different format, including: Fasta, GenBank, EMBL, SwissProt and more… • Extract gene annotation from GenBank, EMBL, SwissProt files • Read and analyse BLAST results. • Read and process phylogenetic trees and multiple sequence alignments. • Analysing SNP data. • And more…
$obj0x225d14 func()anotherFunc() Object-oriented use of packages Many packages are meant to be used as objects. In Perl, an object is a data structure that can use subroutines that are associated with it. We will not learn object oriented programming,but we will learn how to create and use objects defined by BioPerlpackages.
BioPerl: the SeqIO module BioPerl modules are named Bio::xxxx The Bio::SeqIO module deals with Sequences Input and Output: In order to create a new SeqIO object, use new Bio::SeqIO as follows: use Bio::SeqIO; my $in = newBio::SeqIO();
$in0x25e211 next_seq()write_seq() BioPerl: the SeqIO module BioPerl modules are named Bio::xxxx The Bio::SeqIO module deals with Sequences Input and Output: We will pass arguments to the new argument of the file name and format use Bio::SeqIO; my $in = newBio::SeqIO( "-file"=> "<seq.gb","-format"=> "GenBank"); File argument(filename as would be in open) Format argument A list of all the sequence formats BioPerl can read is in:http://www.bioperl.org/wiki/HOWTO:SeqIO#Formats
$in0x25e211 next_seq() write_seq() BioPerl: the SeqIO module use Bio::SeqIO; my $in = newBio::SeqIO( "-file" => "<seq.gb", "-format" => "GenBank"); my $seqObj = $in->next_seq(); Perform next_seq()subroutine on $in You could think of it as:SeqIO::next_seq($in) next_seq() returns the next sequence in the file as a Bio::Seq object (we will talk about them soon)
$in0x25e211 $out0x3001a3 next_seq() write_seq() next_seq()write_seq() BioPerl: the SeqIO module use Bio::SeqIO; my $in = newBio::SeqIO( "-file" => "<seq.gb", "-format" => "GenBank"); my $out = new Bio::SeqIO("-file" => ">seq2.fasta", "-format" => "Fasta"); my $seqObj = $in->next_seq(); while (defined $seqObj){ $out->write_seq($seqObj); $seqObj = $in->next_seq(); } write_seq()write a Bio::Seq object to $out according to its format
BioPerl: the Seq module Bio:seqIOincludesuse Bio::Seq use Bio::SeqIO; my $in = newBio::SeqIO( "-file" => "<seq.fasta", "-format" => "Fasta"); my $seqObj = $in->next_seq(); while (defined $seqObj) { print "ID:".$seqObj->id()."\n"; #1st word in header print "Desc:".$seqObj->desc()."\n"; #rest of header print "Sequence: ".$seqObj->seq()."\n"; #seq string print "Length:".$seqObj->length()."\n"; #seq length $seqObj = $in->next_seq() } You can read more about the Bio::Seq subroutines in: http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object This one is extremely useful ...
Printing last 30aa of each sequence open (IN, "<seq.fasta") or die "Cannot open seq.fasta..."; my $fastaLine = <IN>; while (defined $fastaLine) { chomp $fastaLine; # Read first word of header if (fastaLine =~ m/^>(\S*)/) { my $header = substr($fastaLine,1); $fastaLine = <IN>;} # Read seq until next header my $seq = ""; while ((defined $fastaLine) and(substr($fastaLine,0,1) ne ">" )) { chomp $fastaLine; $seq = $seq.$fastaLine; $fastaLine = <IN>; } # print last 30aa my $subseq = substr($seq,-30); print "$header\n"; print "$subseq\n"; }
Now using BioPerl use Bio::SeqIO; my $in = new Bio::SeqIO("-file" => "<seq.fasta","-format" => "Fasta"); my $seqObj = $in->next_seq(); while (defined $seqObj) { # Read first word of header my $header = $seqObj->id(); # print last 30aa my $seq = $seqObj->seq(); my $subseq = substr($seq,-30); print "$header\n"; print "$subseq\n"; $seqObj = $in->next_seq(); }
Class exercise 12b • Write a script that uses Bio::SeqIO to read a FASTA file (use the EHD nucleotide FASTA from the webpage) and print to an output FASTA file only sequences shorter than 3,000 bases. • Write a script that uses Bio::SeqIO to read a FASTA file, and print (to the screen) all header lines that contain the words "Mus musculus" (you may use the same file). • Write a script that uses Bio::SeqIO to read a GenPept file (use preProInsulinRecords.gp from the webpage), and convert it to FASTA. • 4*. Same as Q1, but print to the FASTA the reverse complement of each sequence. (Do not use the reverse or tr// functions! BioPerl can do it for you - read the BioPerl documentation).
Installing BioPerl – how to add a repository to the PPM • Start All Programs Active Perl… Perl Package manager • You might need to add a repository to the PPM before installing BioPerl:
Installing BioPerl – how to add a repository to the PPM • Click the “Repositories” tab, enter “bioperl” in the “Name” field and http://bioperl.org/DIST in the “Location” field, click “Add”, and finally “OK”:
Installing modules from the internet • The best place to search for Perl modules that can make your life easier is:http://www.cpan.org/ • The easiest way to download and install a module is to use the Perl Package Manager (part of the ActivePerl installation) 1.Choose “View all packages” 2. Enter module name(e.g. bioperl) 3. Choose module(e.g. bioperl) 4. Add it to the installation list 5. Install! Note: ppm installs the packages under the directory “site\lib\” in the ActivePerl directory. You can put packages there manually if you would like to download them yourself from the net, instead of using ppm.
BioPerl installation • In order to add BioPerl packages you need to download and execute the bioperl10.bat file from the course website. • If that that does not work – follow the instruction in the last three slides of the BioPerl presentation. • Reminder:BioPerl warnings about:Subroutine ... redefined at ... Should not trouble you, it is a known issue – it is not your fault and won't effect your script's performances.
Installing modules from the internet • Alternatively in older Active Perl versions- Note: ppm installs the packages under the directory “site\lib\” in the ActivePerl directory. You can put packages there manually if you would like to download them yourself from the net, instead of using ppm.