540 likes | 603 Views
Quiz 1 Homework Review Programming Assignment # 1 Perl shortcuts Declaring variables and Scope Subroutines passing arguments array references Programming Methods Top Down Design Bottom Up Coding and Testing Debugging Reading manuals and help pages Plain old documentation (POD).
E N D
Quiz 1 Homework Review Programming Assignment # 1 Perl shortcuts Declaring variables and Scope Subroutines passing arguments array references Programming Methods Top Down Design Bottom Up Coding and Testing Debugging Reading manuals and help pages Plain old documentation (POD) Lab time Topics BINF 634 FALL 2014
Acknowledgements • Thanks to John Grefenstette for allowing me to use these slides as a starting point for tonight’s lecture BINF 634 FALL 2014
Some Humor • Perl can be powerful BINF 634 FALL 2014
Output 0 1 2 3 4 5 6 7 8 9 Perl Shortcuts • Any simple statement can be followed by a single modifier right before the ; or closing } STATEMENT if EXPR STATEMENT unless EXPR STATEMENT while EXPR STATEMENT until EXPR $ave = $ave/$n unless $n == 0; Same as: unless ($n == 0) { $ave = $ave/$n } What does this do? $x = 0; print $x++, "\n" until $x == 10; BINF 634 FALL 2014
Perl Shortcuts • Any simple statement can be followed by a single modifier STATEMENT foreach LIST STATEMENT is evaluated for each item in LIST, with $_ set to current item. @A = qw/One two three four/; print "$_\n" foreach @A; Output: One two three four BINF 634 FALL 2014
3 One 3 two 5 three 4 four Perl Shortcuts • Predefined Perl functions may be used with or without parentheses around their arguments: $next = shift @A; open FILE, $filename or die "Can't open $filename"; $next = shift @A; @chars = split //, $word; @fields = split /:/, $line; • Many Perl functions assume $_ if their argument is omitted: @A = qw/One two three four/; print length, " $_\n" foreach @A; BINF 634 FALL 2014
Scope of variables • my variables can be accessed only until the end of the enclosing block (or until end of file, if outside any block) • It's best to declare a variable in the smallest possible scope if ($x < $y) { my $tmp = $x; $x = $y; $y = $tmp } • Variable declared in a control-flow statement are visible only with the associated block: my @seq_list = qw/ATT TTT GGG/; my $sequence = "NNN"; for my $sequence (@seq_list){ $sequence .= "TAG"; print "$sequence\n"; } print "$sequence\n"; Output: ATTTAG TTTTAG GGGTAG NNN Are these two different variables? BINF 634 FALL 2014
Subroutines Advantages of Subroutines • Shorter code • Easier to test • Easier to understand • More reliable • Faster to write • Re-usable BINF 634 FALL 2014
Subroutines • Defining a subroutine: sub name { BLOCK } • Arguments are accessed through array @_ • Subroutine values are returned by: return VALUE • Subroutines may be defined anywhere in the file, but are usually placed at end • They can be arranged alphabetically or by functionality BINF 634 FALL 2014
Passing Parameters Into Subroutines • Values are passed into subroutines using the special array @_ • How do we know that this is an array?? • The shortened name of this argument is _ • It contains all of the scalars passed into the subroutine BINF 634 FALL 2014
Pass by Value Why are the two values different? #!/usr/bin/perl -w # A driver program to test a subroutine that # uses pass by value use strict; use warnings; my $i = 2; simple_sub($i); print "In main program, after the subroutine call, \$i equals $i\n\n"; exit; sub simple_sub { my($i)=@_; $i += 100; print "In subroutine simple_sub, \$i equals $i\n\n"; } Output In subroutine simple_sub, $i equals 102 In main program, after the subroutine call, $i equals 2 BINF 634 FALL 2014
There is a bug in this program as written can you find it? How would you fix it to produce the indicated output below? #!/usr/bin/perl use strict; use warnings; # File: min.pl my $a = <STDIN>; chomp $a; my $b = <STDIN>; chomp $b; $small = min($a, $b); print "min of $a and $b is $small\n"; exit; sub min { my ($n, $m) = @_; # @_ is the array of parameters if ($n < $m) { return $n } else { return $m } } %min.pl 123 45 min of 123 and 45 is 45 $small is not declared BINF 634 FALL 2014
#!/usr/bin/perl use strict; use warnings; # File: min_max.pl ## Subroutines can return lists my $a = <STDIN>; chomp $a; my $b = <STDIN>; chomp $b; my ($small, $big) = min_max($a, $b); print "max of $a and $b is $big\n"; print "min of $a and $b is $small\n"; exit; sub min_max { my ($n, $m) = @_; # @_ is the array of parameters if ($n < $m) { return ($n, $m) } else { return ($m, $n) } } % min_max.pl 123 45 max of 123 and 45 is 123 min of 123 and 45 is 45 BINF 634 FALL 2014
Passing arguments • All arguments are passed in a single list @a = qw/ This will all /; $b = "end"; @c = qw/ up together /; @c = foo(@a, $b, @c); print "@c\n"; sub foo { my @args = @_; return @args; } Output: This will all end up together BINF 634 FALL 2014
#!/usr/bin/perl -w # A driver program to test a subroutine that # illustrates array flattening use strict; use warnings; my @i = ('1', '2', '3'); my @j = ('a','b','c'); print "In main program before calling subroutine: i = " . "@i\n"; print "In main program before calling subroutine: j = " . "@j\n"; reference_sub(@i, @j); print "In main program after calling subroutine: i = " . "@i\n"; print "In main program after calling subroutine: j = " . "@j\n"; exit; sub reference_sub { my (@i, @j) = @_; print "In subroutine : i = " . "@i\n"; print "In subroutine : j = " . "@j\n"; push(@i, '4'); shift(@j); } Output In main program before calling subroutine: i = 1 2 3 In main program before calling subroutine: j = a b c In subroutine : i = 1 2 3 a b c In subroutine : j = In main program after calling subroutine: i = 1 2 3 In main program after calling subroutine: j = a b c Array Flattening BINF 634 FALL 2014
Passing by Value Versus Passing by Reference • Passing by Value • Pass a copy of the variable • Changes made to variable in subroutine do not effect the value of variables in the main body • Can cause array flattening • Passing by Reference • Pass a reference (pointer) to the variable • Must be dereferenced when used in the subroutine • This is the cure for array flattening BINF 634 FALL 2014
Perl References - I • A reference is a scalar variable that refers to (points to) another variable • So a reference might refer to an array $aref = \@array; # $aref now holds a reference to @array $xy = $aref; # $xy now holds a reference to @array #Lines 2 and 3 working together do the same thing as line 1 $aref = [ 1, 2, 3 ]; @array = (1, 2, 3); $aref = \@array; http://perl.plover.com/FAQs/references.html BINF 634 FALL 2014
Perl References - II http://perl.plover.com/FAQs/references.html BINF 634 FALL 2014
Dereferencing ${$aref}[3] is too hard to read, so you can write $aref->[3] instead • Additional helpful discussions can be found at • http://oreilly.com/catalog/advperl/excerpt/ch01.html http://perl.plover.com/FAQs/references.html BINF 634 FALL 2014
#!/usr/bin/perl -w # A driver program to test a subroutine that # passes by reference use strict; use warnings; my @i = ('1', '2', '3'); my @j = ('a','b','c'); print "In main program before calling subroutine: i = " . "@i\n"; print "In main program before calling subroutine: j = " . "@j\n"; reference_sub(\@i, \@j); print "In main program after calling subroutine: i = " . "@i\n"; print "In main program after calling subroutine: j = " . "@j\n"; exit; sub reference_sub { my ($i, $j) = @_; print "In subroutine : i = " . "@$i\n"; print "In subroutine : j = " . "@$j\n"; push(@$i, '4'); shift(@$j); } Output: In main program before calling subroutine: i = 1 2 3 In main program before calling subroutine: j = a b c In subroutine : i = 1 2 3 In subroutine : j = a b c In main program after calling subroutine: i = 1 2 3 4 In main program after calling subroutine: j = b c Passing by Reference BINF 634 FALL 2014
Arrays references @a = qw/ This will all /; $b = "end"; @c = qw/ up together /; # this passes in references to the arrays bar(\@a, $b, \@c); # \@a is a reference (pointer) to @a sub bar { my ($x, $b, $z) = @_; # @_ has three items # dereference first argument my @A = @$x; # @$x is the array referenced by $x # dereference third argument my @C = @$z; print "@A\n"; print "$b\n"; print "@C\n"; } This will all end up together • To pass more than one list to a subroutine, use references to the arrays BINF 634 FALL 2014
Input Algorithm Output Program Design Q. What is the form of input data? Q. How will the program will get it? • interactive • command line • parameter file Q. How will the program process the data to compute the desired output? • How will the output be formatted and delivered? Specified by user requirements BINF 634 FALL 2014
Program Design • Design Top Down • Identify the inputs • Understand the requirements for the output • Design an overall algorithm for computing the output • Express overall method in pseudocode • Refine pseudocode until each step forms a well-defined subroutine • Test Bottom Up • Write and debug subroutines one at a time • Start with “utility” subroutines that will be used by other subroutines • Test each subroutine with input data that gives known results • Include subroutines that help debugging, such as printing routines for data structures BINF 634 FALL 2014
Pseudocode • High level, informal program • No details Example: print out length statistics and overall nucleotide usage statistics for a file of sequences Input: get sequences from DNAfile Algorithm: for each DNA sequence, get length statistics count each type of nucleotide Output: print length statistics print nucleotide usage statistics BINF 634 FALL 2014
Pseudocode • Keep pseudocode in perl program as comments # get sequences from DNAfile # for each DNA sequence, # get length statistics # count each type of nucleotide # print length statistics # print nucleotide usage statistics BINF 634 FALL 2014
Refinement Refine pseudocode into more detailed steps: Input: get name of DNAfile open DNAfile read lines from DNAfile, putting DNA sequences in a list Algorithm: for each DNA sequence in the list get length and update statistics count each type of nucleotide in the sequence Output: print length statistics print nucleotide usage statistics BINF 634 FALL 2014
Algorithm Refinement Try to express complex tasks using Perl control structures (e.g. loops) until inner subtasks for well-defined tasks that can be done by a single subroutine. Algorithm: for each DNA sequence in the list get length and update statistics count each type of nucleotide in the sequence for each DNA sequence in the list get length and update statistics for each base count the occurrence of that base in the sequence Now write a subroutine to count any base in any sequence BINF 634 FALL 2014
Program Design • Design Top Down • Identify the inputs • Understand the requirements for the output • Design an overall algorithm for computing the output • Express overall method in pseudocode • Refine pseudocode until each step forms a well-defined subroutine • Test Bottom Up • Write and debug subroutines one at a time • Start with “utility” subroutines that will be used by other subroutines • Test each subroutine with input data that gives known results • Include subroutines that help debugging, such as printing routines for data structures BINF 634 FALL 2014
#!/usr/bin/perl # File: sub1.pl # subroutine to count A's in DNA use warnings; use strict; my $a; my $dna = "tagATAGAC"; $a = count_A($dna); print "$dna\n"; print "a: $a\n"; exit; ######################################### # subroutine to count A's in DNA # sub count_A { # @_ is the list of parameters my ($dna) = @_; # array context assignment my $count; # tr returns number of matches $count = ($dna =~ tr/Aa//); return $count; } Output: tagATAGAC a: 4 After you've written a subroutine, ask yourself if it can be made a bit more general BINF 634 FALL 2014
#!/usr/bin/perl # File: sub2.pl # subroutine to count any letter in DNA use warnings; use strict; my ($a, $c, $g, $t); my $dna = "tagATAGAC"; $a = count_base('A', $dna); $t = count_base('T', $dna); $c = count_base('C', $dna); $g = count_base('G', $dna); print "$dna\n"; print "a: $a t: $t c: $c g: $g\n"; exit; ######################################### # # subroutine to count any letter in DNA # sub count_base { my( $base, $dna ) = @_; my( $count ); $count = ($dna =~ s/$base//ig); return $count; } Output: tagATAGAC a: 4 t: 2 c: 1 g: 2 BINF 634 FALL 2014
Program Design: Managing Complexity • Understand inputs and outputs • Use pseudocode to refine your algorithm • Use divide-and-conquer to turn big problems into manageable pieces • within a chromosomes, process one gene at a time • within each gene, process one reading frame at a time • within each reading frame, process one ORF at a time • Pick data structures that make algorithms easier • this gets easier with experience! • Write subroutines to • transform one data object to another, for example: • dna (string) to reading frame (array of codons) • reading frame to orf • perform some well defined task • compute some statistics on a single data object • produce final output format • Write small programs (drivers) to test each subroutine before combining them together BINF 634 FALL 2014
Some Good Programming References • Algorithms + Data Structures = Programs (Prentice-Hall Series in Automatic Computation)[Hardcover] • Niklaus Wirth (Author) • Introduction to Algorithms [Hardcover] • Thomas H. Cormen (Author), Charles E. Leiserson (Author), Ronald L. Rivest (Author), Clifford Stein (Author) BINF 634 FALL 2014
Read The Fine Manual (RTFM) • The more you read manuals, the easier it will be • For each function we have covered tonight, read the corresponding description in Ch. 29 of Wall • If you find something in the manual you don't understand, look it up (or ask someone) • Learn to use the online help pages, e.g., % perldoc -f join • To see a list of online tutorials, see % perldoc perl For example: % perldoc perlstyle • The interface is somewhat vi like BINF 634 FALL 2014
Debugging Strategies • Before running the program, always run % perl -c prog • Read the warnings and error message from the compiler carefully • Always use strict and use warnings • Basic strategy: bottom-up debugging • Test and debug one subroutine at a time • Insert print statements • to figure out where a program fails • to print values of variables • Comment out when not needed - don't remove! BINF 634 FALL 2014
Starting the Debugger [binf:~/binf634/workspace/binf634_book_examples] jsolka% perl -d example-6-4.pl Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(example-6-4.pl:11): my $dna = 'CGACGTCTTCTAAGGCGA'; DB<1> BINF 634 FALL 2014
Getting Help Within the Debugger - I DB<2> h List/search source lines: Control script execution: l [ln|sub] List source code T Stack trace - or . List previous/current line s [expr] Single step [in expr] v [line] View around line n [expr] Next, steps over subs f filename View source in file <CR/Enter> Repeat last n or s /pattern/ ?patt? Search forw/backw r Return from subroutine M Show module versions c [ln|sub] Continue until position Debugger controls: L List break/watch/actions o [...] Set debugger options t [expr] Toggle trace [trace expr] <[<]|{[{]|>[>] [cmd] Do pre/post-prompt b [ln|event|sub] [cnd] Set breakpoint ! [N|pat] Redo a previous command B ln|* Delete a/all breakpoints H [-num] Display last num commands a [ln] cmd Do cmd before line = [a val] Define/list an alias A ln|* Delete a/all actions h [db_cmd] Get help on command w expr Add a watch expression h h Complete help page W expr|* Delete a/all watch exprs |[|]db_cmd Send output to pager ![!] syscmd Run cmd in a subprocess q or ^D Quit R Attempt a restart BINF 634 FALL 2014
Getting Help With the Debugger - II Data Examination: expr Execute perl code, also see: s,n,t expr x|m expr Evals expr in list context, dumps the result or lists methods. p expr Print expression (uses script's current package). S [[!]pat] List subroutine names [not] matching pattern V [Pk [Vars]] List Variables in Package. Vars can be ~pattern or !pattern. X [Vars] Same as "V current_package [Vars]". i class inheritance tree. y [n [Vars]] List lexicals in higher scope <n>. Vars same as V. e Display thread id E Display all thread ids. For more help, type h cmd_letter, or run man perldebug for all docs. BINF 634 FALL 2014
Stepping Through Statements With the Debugger main::(example-6-4.pl:11): my $dna = 'CGACGTCTTCTAAGGCGA'; DB<2> p $dna DB<3> DB<3> n main::(example-6-4.pl:12): my @dna; DB<6> l 12==> my @dna; 13: my $receivingcommittment; 14: my $previousbase = ''; 15 16: my$subsequence = ''; 17 18: if (@ARGV) { 19: my$subsequence = $ARGV[0]; 20 }else{ 21: $subsequence = 'TA'; DB<6> p $dna CGACGTCTTCTAAGGCGA BINF 634 FALL 2014
Using the Perl Debugger DB<7> n n main::(example-6-4.pl:13): my $receivingcommittment; DB<7> n main::(example-6-4.pl:14): my $previousbase = ''; DB<7> n main::(example-6-4.pl:16): my$subsequence = ''; DB<7> n main::(example-6-4.pl:18): if (@ARGV) { DB<7> n main::(example-6-4.pl:21): $subsequence = 'TA'; DB<7> n main::(example-6-4.pl:24): my $base1 = substr($subsequence, 0, 1); BINF 634 FALL 2014
Using the Perl Debugger DB<7> n main::(example-6-4.pl:25): my $base2 = substr($subsequence, 1, 1); DB<7> n main::(example-6-4.pl:28): @dna = split ( '', $dna ); DB<7> p $base1 T DB<8> p $base2 A DB<9> DB<9> n main::(example-6-4.pl:39): foreach (@dna) { DB<9> p @dna CGACGTCTTCTAAGGCGA DB<10> p "@dna" C G A C G T C T T C T A A G G C G A DB<11> BINF 634 FALL 2014
Examining the Loop DB<12> l 39-52 39==> foreach (@dna) { 40: if ($receivingcommittment) { 41: print; 42: next; 43 } elsif ($previousbase eq $base1) { 44: if ( /$base2/ ) { 45: print $base1, $base2; 46: $recievingcommitment = 1; 47 } 48 } 49: $previousbase = $_; 50 } 51 52: print "\n"; DB<13> DB<13> b 40 BINF 634 FALL 2014
Clearing Breakpoints and Exiting the Debugger DB<14> c main::(example-6-4.pl:40): if ($receivingcommittment) { DB<14> p C DB<16> B Deleting a breakpoint requires a line number, or '*' for all DB<18> q • For additional discussions please see • Ch. 20 of Wall or Ch. 6 of Tisdall BINF 634 FALL 2014
Modules and Libraries - I • We will have more to say about this later • We will collect subroutines into handy files called modules or libraries • We tell the Perl compiler to utilize a particular module with the “use” command BINF 634 FALL 2014
Modules and Libraries - II • Modules end in .pm BeginPerlBioinfo.pm • The last line in a module must be 1; • So we would access this module by putting the line use BeginPerlBioinfo; • If the Perl compiler can’t find it you may have to tell it the path use lib ‘/home/tisdall/book’ use BeginPerlBioinfo; BINF 634 FALL 2014
POD(Ch. 26 in Wall) • Plain Old Documentation produces self-documenting programs • Comments can be extracted and formatted by external programs called translators • Keeps program documentation consistent with external documentation • pod text begins with "=identifier" at the start of a line • but only where the compiler is expected a new statement • All text is ignored by compiler until next line starting with "=cut" • Various translators produced formatted documentation • perldoc, pod2text, pod2html, pod2latex ,etc • details of format depends on identifier BINF 634 FALL 2014
=begin Put any number of lines of comments here. They will appear in the proper format when processed by pod translators. =cut # program text goes here =begin comment The identifier indicates which translator should process this text. This text will be ignored by all translators. Use this for internal documentation only. =cut # more program text ... =head1 Section Heading text goes here, for example: =head1 SYNOPSIS usage: fasta.pl fastafile =over This starts a list: =item * First item in a list. =item * Second item. =back =cut BINF 634 FALL 2014
An Example Program #!/usr/bin/perl =head1 NAME arglist.pl =head1 AUTHOR Jeff Solka =head1 SYNOPSIS usage: arglist.pl arg1 arg2 ... =head1 DESCRIPTION Echoes out the command line arguments. =over =item * First item in a list. =item * Second item. =back =cut ### main program print "The arguments are: @ARGV\n"; exit; BINF 634 FALL 2014
Our Program in Action [binf:fall09/binf634/mycode] jsolka% arglist.pl cat The arguments are: cat BINF 634 FALL 2014
pod2text acting On Our Program [binf:fall09/binf634/mycode] jsolka% pod2text arglist.pl NAME arglist.pl AUTHOR Jeff Solka SYNOPSIS usage: arglist.pl arg1 arg2 ... DESCRIPTION Echoes out the command line arguments. * First item in a list. * Second item. • See Ch. 26 for other formatting tricks. BINF 634 FALL 2014
perldoc Acting on Our Program [binf:fall09/binf634/mycode] jsolka% perldoc arglist.pl > arglist.mp [binf:fall09/binf634/mycode] jsolka% cat arglist.mp ARGLIST(1) User Contributed Perl Documentation ARGLIST(1) NAME arglist.pl AUTHOR Jeff Solka SYNOPSIS usage: arglist.pl arg1 arg2 ... DESCRIPTION Echoes out the command line arguments. o First item in a list. o Second item. perl v5.8.8 2009-09-20 ARGLIST(1) BINF 634 FALL 2014