350 likes | 540 Views
Computer Programming for Biologists. Class 9 Dec 5 th , 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25. Computer Programming for Biologists. Overview. revision variable scope course project file handles. Computer Programming for Biologists. Revision - Pragmas.
E N D
Computer Programming for Biologists Class 9 Dec 5th, 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25
Computer Programming for Biologists Overview • revision • variable scope • course project • file handles
Computer Programming for Biologists Revision - Pragmas • Start of Perl scripts: • use warnings; • use strict; • Changes behaviour of Perl • Leads to better programming practice
Computer Programming for Biologists Revision - Hashes my %seq = (); # initialisation $freq{$char} = 0; # storing a value $freq{$char}++; # changing a value my $aa = $code{$codon}; # extracting foreach my $header (sort keys %seq) { my $seq = $seq{$header}; … }
0 0 0 0 my $A = 0; my $C = 0; my $G = 0; my $T = 0; • A C G T Hash Variables Scalars vs Hash Initialisation of values: my %frequency = ();
1 C Hash Variables Scalars vs Hash Increment: %frequency C if ($char eq 'A') { $A++; } elsif ($char eq 'C') { $C++ } elsif ($char eq 'G') { $G++; } elsif ($char eq 'T') { $T++; } 1 $frequency{$char}++;
5 9 7 5 • A C G T Hash Variables Scalars vs Hash %frequency T G 9 C A
5 7 9 5 • A T G C Hash Variables Scalars vs Hash %frequency Output: T G print "Frequency of A: $A"\n; print "Frequency of C: $C"\n; print "Frequency of G: $G"\n; print "Frequency of T: $T"\n; 9 C A foreach my $char (keys %frequency) { print "Frequency of $char: $frequency{$char}++; }
Computer Programming for Biologists Revision - Subroutines my $prot = &translate($seq); # call sub translate {# definition my $seq = shift @_; # parameters … return $prot; # return value(s) }
Computer Programming for Biologists scope • The area of the script in which a variable is visible. • Different blocks defined by: • main program • subroutines • loops • branches • different namespaces
Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement a; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement a; } } main part
Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement a; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement a; } } Blocks
Computer Programming for Biologists scope Tip: Keep local variables within subroutines explicitly pass content between main part and subs, e.g.: $num = &split($seq); value returned from subroutine value passed into subroutine avoid accidentally changing global variables
Computer Programming for Biologists scope # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { my $header = $1; } } print "sequence ID: $header\n"; Wrong: Global symbol "$header" requires explicit package name
Computer Programming for Biologists scope # initialize global variable my $header = ''; # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { $header = $1; } } print "sequence ID: $header\n"; Correct:
Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna .= $input; }
Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna .= $input; } different variables
Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna .= $input; } same variable
Computer Programming for Biologists course project common errors: (case) Use of uninitialized value $count{'G'} my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna .= uc $input; } change input to upper case
Computer Programming for Biologists course project common errors: (arrangement) # print output in chunks of 60 bp width while ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out); } # change string to array: my @chars = split //, $dna; empties $dna
Computer Programming for Biologists course project considerations: # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); order is important # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/;
Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = ''; my $do_gc_content = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_gc_content) { &gc_content($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } make actions optional
Computer Programming for Biologists course project Work on your course project (sequanto.pl): correct errors add "choice" variables at the top move code blocks into subroutines (GC-content, translation)
Data Input/Output Redirect output Program prints output to screen: $ translate.pl seq.fa MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII… Redirect into file: $ translate.pl seq.fa > seq.aa Append to file: $ translate.pl seq.fa >> seq.aa
Data Input/Output Filehandles Reading from STDIN, default input stream: my $in = <>; Use filehandle to read input from a specific file: open (IN, 'input.txt'); # open file for reading while (my $in = <IN>) { … } # read content line by line close IN; # close filehandle when finished filehandle
Data Input/Output Filehandles Syntax: open (FH, filename); # open file for reading open (FH, "< filename"); # open file for reading open (FH, "> filename"); # open file for writing open (FH, ">> filename"); # append to file close FH; # empties buffer Write and append mode will create files if necessary Write mode will empty file first
Data Input/Output Writing to files $file_name = 'results.txt'; if ($write_modus eq 'append') { # append to file (creates file if necessary) open (OUT, ">>$file_name"); } else { # normal write (erases content if file exists) open (OUT, ">$file_name"); } print OUT 'some text'; close OUT; # output might not appear until FH is closed!
Data Input/Output Error check! Always test if an important operation worked out: open (IN, $file_name) or die "Can't read from $file_name: $!"; open (OUT, ">>$file_out") or die "Can't append to $file_out: $!"; # Note: special variable $! contains error message
Data Input/Output Reading from Filehandle One or more file names are specified after the program, loop over each argument: foreach my $file (@ARGV) { # special variable @ARGV open (IN, $file) or die; # open filehandle while (my $in = <IN>) { # read file line by line # do something } close IN; # close filehandle }
Computer Programming for Biologists Reading sequence from a file # get pattern my $pattern = shift @ARGV; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence .= $_; } Note: two command line arguments! $ perl split.pl gcctg test.fa # read pattern and sequence my ($pattern, $file) = @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while (<IN>) { next if (/^>/); chomp; $sequence .= $_; } close IN;
Computer Programming for Biologists course project Work on your course project (sequanto.pl): Add explicit opening of file-handle Store translated sequence into a new file
Computer Programming for Biologists Control through options Perl module Getopt::Long allows processing command line options.
Computer Programming for Biologists Control through options $man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; my $verbose; $result = GetOptions ("length=i" => \$length, # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag
Computer Programming for Biologists Control through options Try this in your script: use Getopt::Long; my $do_translation = ''; my $do_gc_content = ''; &GetOptions("translate" => \$do_translation, "gc" => \$do_gc_content, );
Computer Programming for Biologists Exam Reminder Exam 2: Thu, Dec 12th, 11 -1 pm