370 likes | 384 Views
CS 360. Perl Part 2. Remember. Assignments are Due Today Make a web page for all of your assignments You get to design it so it works well for you Email the TA with the URL Lab is due Monday Clean out your directory We are asking for more space. Example.
E N D
CS 360 Perl Part 2
Remember • Assignments are Due Today • Make a web page for all of your assignments • You get to design it so it works well for you • Email the TA with the URL • Lab is due Monday • Clean out your directory • We are asking for more space
Example • dna1.txt has the following in it. • hopper acgtacacactgca • flick acgcacacattgca • spot acgcacaccttgca • How would you write a perl script to read this file in and print it (assume that the command line has the file name in it)
Whoa!! What about the command line • How about @ARGV or @_ • #/usr/bin/perl • print "ARGV=",@ARGV, "\n"; • print "ARGV[0]=",$ARGV[0], "\n"; • print "ARGV[1]=",$ARGV[1], "\n"; • open(DNAFILE, $ARGV[0]) or die "cant open $ARGV[0]: $!"; • while(<DNAFILE>) { • print "line $_\n"; • }
Now search for the taxa “hopper” Whoa!! How do I search!! • print "line $_\n"; • @words = split /\s+/, $_; • print "$words[0] \n"; • print "$words[1] \n"; • if($words[0] eq "hopper") { • print "found hopper: dna $words[1]\n"; • } else { • print "not hopper"; • }
search for the string “hopper” Whoa!! How do I search!! • print "line $_\n"; • @words = split /\s+/, $_; • print "$words[0] \n"; • print "$words[1] \n"; • if($words[0] eq "hopper") { • print "found hopper: dna $words[1]\n"; • } else { • print "not hopper"; • }
Perl Regular Expresions • Extremely Powerful Text Processing. • One of Perls most useful yet most misunderstood features • ‘=~’ indicates a regexp match • if ($var =~ /BLAH/) – Match string • if ($var =~ /^BLAH/) – Start of String • if ($var =~ /BLAH$/) – End of String • if ($var =~ /\w+/) – Any letters • \w - Letters • \d - Numbers • \s - Whitespace • . - Anything reg.pl
Now search for the taxa “hopper” Whoa!! How do I search!! • print "line $_\n"; • @words = split /\s+/, $_; • print "$words[0] \n"; • print "$words[1] \n"; • if($words=~/^hop|^fli/) { • print "found hopper or flick: dna $words[1]\n"; • } else { • print "not either"; • }
Perl – Everything else • Supplied Perl Docs • man perlfunc • man perlfaq • man perlsyn • man perlre (fun) • The Perl Bible • “Perl In a Nutshell” – O’Reilly
Now for some Magic • $dna = $words[1]; • print "dna $dna \n"; • $reversed = scalar reverse $dna; • print "reversed dna $reversed \n"; • $rna = $reversed; • $rna =~ tr/acgt/UGCA/;
Hash Tables • Index using a variable # Hashes my %organisms = ( grasshoppers => "gh", fleas => "fs", lobster => "lb", ); my @names = keys %organisms; my @abbrev = values %organisms; print "names ", @names[1], ", abbrev ", @abbrev[1], "\n"; print "fleas = ", $organisms{"fleas"}, "\n"; print %organisms, "\n";
Nucleotide Translation • $codonMap{"gct"}="A"; #Alanine • $codonMap{"tgt"}="C"; #Cysteine • $codonMap{"gcc"}="A"; #Alanine • $codonMap{"tgc"}="C"; #Cysteine • $mrna="gcttgtgcctgc"; • my $pro=""; • while ($mrna=~s/(...)//) { # Dots match any character • print "codon $1\n"; • $pro=$pro.$codonMap{$1}; • print "pro $pro\n"; • }
cd ~/public_html vi hits.cgi :r ~clement/public_html/hits.cgi :wq chmod a+x hits.cgi Open in a web browser http://students.cs.byu.edu/~you/hits.cgi Now for the fun stuff !!
Why use subroutines? • They will make your program • Shorter, since you're reusing the code. • Easier to test, since you can test the subroutine separately. • Easier to understand, since it reduces clutter and better organizes programs. • More reliable, since you have less code when you reuse subroutines, so there are fewer opportunities for something to go wrong. • Faster to write, since you may, for example, have already written some subroutines that handle basic statistics and can just call the one that calculates the mean without having to write it again. Or better yet, you found a good statistics library someone else wrote, and you never had to write it at all.
Description • Like many languages, Perl provides for user-defined subroutines. • The Perl model for subroutine call and return values is simple: all subroutines are passed as parameters one single flat list of scalars, and all subroutines likewise return to their caller one single flat list of scalars. • Any arguments passed to the subroutine come in as the array @_. • The return value of the subroutine is the value of the last expression evaluated. Alternatively, a return statement may be used to exit the subroutine
To declare subroutines sub NAME; # A "forward" declaration. sub NAME(PROTO); # ditto, but with prototypes sub NAME BLOCK # A declaration and a definition. sub NAME(PROTO) BLOCK # ditto, but with prototypes
To call subroutines NAME(LIST); OR &NAME(LIST); # & is optional with parentheses. NAME LIST; # Parentheses optional if predeclared/imported. &NAME; # Makes current @_ visible to called subroutine.
Example (similarity.pl) sub percent_identity { $seq1 = $_[0]; $len1 = length $seq1; $seq2 = $_[1]; $len2 = length $seq2; $num_mismatches = 0; for $i (0..$len1-1) { if (substr($seq1, $i,1) ne substr($seq2, $i, 1)) { $num_mismatches++; } } if($len2 > $len1) { $num_mismatches += ($len2-$len1); } return (($len1-$num_mismatches)*100/$len1); } $seq1="acctgaatg"; $seq2="atcgtgagtg"; print "percent identity = ". percent_identity($seq1, $seq2) . "\n";
Scoping and Arguments • The variables declared with a my belong only to the blockin which they are declared. In out example, $DNA has effect only in the subroutine. • You don’t have to worry about name conflicts outside the subroutine • You don’t have to worry about accidentally change the values of some variables • All the arguments passed to the subroutine are stored in the array @_. You can access parameters with • my $DNA = $_[0];
Another simple example #!/usr/bin/perl -w # Counting the number of G's in some DNA my($DNA) = "ACGAGCTGCGAGGCGACTAGCGAGCTAGCGATCAGCTA"; # Call the routine that does the real work and collect the result. my($number_of_Gs) = countG($DNA); print "\nThe DNA sequence $DNA has $number_of_Gs G\'s in it.\n\n"; exit; ######################################################### # Subroutines ######################################################### sub countG { my($DNA) = @_; my($count) = 0; $count = ($DNA =~ tr/Gg//); return $count; }
Pass by value • The values of these arguments are copied and passed to the subroutines. • whatever happens to those values in the subroutine doesn't affect the values of the arguments in the main program
Pass-by-reference (reference.pl) #!/usr/bin/perl # Example of pass-by-reference (a.k.a. call-by-reference) use strict; use warnings; my @i = ('1', '2', '3'); my @j = ('a', 'b', 'c'); reference_sub(\@i, \@j); print "In main program after calling subroutine: i = " . "@i\n"; print "In main program after calling subroutine: j = " . "@j\n"; exit; ############################################################ # Subroutine ############################################################ sub reference_sub { my ($i, $j) = @_; print "In subroutine : i = " . "@$i\n"; print "In subroutine : j = " . "@$j\n"; # push and shift are built-in functions on arrays push(@$i, '4'); shift(@$j); } In main program before calling subroutine: i = 1 2 3, j = a b c In subroutine : i = 1 2 3, j = a b c In main program after calling subroutine: i = 1 2 3 4 In main program after calling subroutine: j = b c
Pass-by-reference • To pass a parameter by reference, you have to preface the name of the parameter with a backslash. • \@i is a reference to array @i. • In the subroutine, $i gets the value of \@i. So it is also a reference to array@i. • When argument variables are passed in this fashion, anything you do to the values of the argument variables in the subroutine also affects the values of the arguments in the main program.
Arrays • Two Dimensional Arrays #!/usr/bin/perl $gap = -2; $st1="acgtactacg"; $st2="acctaccacgt"; $n1=length($st1); $n2=length($st2); # Allocate the array for(my $i = $n1-1; $i >= 0; $i--) { $M[$i][$n2-1] = 0; }
Accessing the Matrix sub printmatrix { print "n1 $n1, n2 $n2\n"; for(my $i = 0; $i < $n1; $i++) { for(my $j = 0; $j < $n2; $j++) { print "M[$i][$j]= $M[$i][$j],"; } print "\n"; } }
Perl Modules • Allow you to make your code modular • Create an Object Oriented interface • Separate your code into separate files so changes wont be made to working code
Modules • Similar idea to libraries in C. • use CGI; • Useful Modules • CGI – CGI routines. • DBI – Database Connectivity. • strict – Makes you code all proper like. • Data::Dumper – Debugging large objects. • XML::Simple – Simple XML Parsing. • Always ‘use strict’!
What is a Module? • A module is a .pm file that defines a library of related functions • Modules are conceptually similar to old-fashioned Perl libraries (.pl files), but have a cleaner implementation • selective namespace cluttering • simpler function invocation
Example (pasture1.pl) sub Cow::speak { print "a Cow goes moooo!\n"; } sub Horse::speak { print "a Horse goes neigh!\n"; } sub Sheep::speak { print "a Sheep goes baaaah!\n" } @pasture = qw(Cow Cow Horse Sheep Sheep); foreach $animal (@pasture) { $animal->speak; }
Arguments Class->method(@args) attempts to invoke subroutine "Class::method" as: Class::method("Class", @args);
Simplifying sub Sheep::speak { my $class = shift; print "a $class goes baaaah!\n"; }
A second method (pasture2.pl) { package Cow; sub sound { "moooo" } sub speak { my $class = shift; print "a $class goes ", $class->sound, "!\n" } } @pasture = qw(Cow Cow Horse Sheep Sheep); foreach $animal (@pasture) { $animal->speak; }
Inheritance (pasture3.pl) { package Animal; sub speak { my $class = shift; print "a $class goes ", $class->sound, "!\n" } } { package Cow; @ISA = qw(Animal); sub sound { "moooo" } } On $animal->speak, Perl looks for "Cow::speak". But that’s not there, so Perl checks for the inheritance array @Cow::ISA. It’s there, and contains the single name "Animal".
Overriding (pasture4.pl) { package Mouse; @ISA = qw(Animal); sub sound { "squeak" } sub speak { my $class = shift; print "a $class goes ", $class->sound, "!\n"; print "[but you can barely hear it!]\n"; } }
How to use a Module • test.pl use Foo; Foo:bar(); • Foo.pm package Foo; @EXPORT = qw (bar); sub bar { print “hello\n”; }
Package Names and Filenames • Package name is declared on line 1 • This should be the same as the filename, without the .pm extension • If it is different, your functions will not be exported correctly • Should begin with a capital letter to avoid possible conflict with pragmas
Summary • Modules are libraries of functions • A simple module just exports a set of functions • Perl modules can be expanded in many directions for arbitrarily sophisticated libraries