260 likes | 410 Views
Introduction to Perl. Bioinformatics. What is Perl?. Practical Extraction and Report Language A scripting language Components an interpreter scripts: text files created by user describing a sequence of steps to be performed by the interpreter. Installation.
E N D
Introduction to Perl Bioinformatics
What is Perl? • Practical Extraction and Report Language • A scripting language • Components • an interpreter • scripts: text files created by user describing a sequence of steps to be performed by the interpreter
Installation • Create a Perl directory under C:\ • Either • Download AP.msi from the course website (http://curry.ateneo.net/~jpv/BioInf07/) and execute (installs into C:\Perl directory) • Or download and unzip AP.zip into C:\Perl • Reset path variable first (or edit C:\autoexec.bat) so that you can execute scripts from MSDOS • C> path=%path%;c:\Perl\bin
Writing and RunningPerl Scripts • Create/edit script (extension: .pl) • C> edit first.pl • Execute script • C> perl first.pl * Tip: place your scripts in a separate work directory # my first script print “Hello World”; print “this is my first script”;
Perl Features • Statements • Strings • Numbers and Computation • Variables and Interpolation • Input and Output • Files • Conditions and Loops • Pattern Matching • Arrays and Lists
Statements • A Perl script is a sequence of statements • Examples of statements print “Type in a value”; $value = <>; $square = $value * $value; print “The square is ”, $square, “\n”;
Comments • Lines that start with # are ignored by the Perl interpreter # this is a comment line • In a line, characters that follow # are also ignored $count = $count + 1; # increment $count
Strings • String • Sequence of characters • Text • In Perl, characters should be surrounded by quotes • ‘I am a string’ • “I am a string” • Special characters specified through escape sequences (preceded by a \ ) • “a newline\n and a tab\t”
Numbers • Integers specified as a sequence of digits • 6 • 453 • Decimal numbers: • 33.2 • 6.04E24 (scientific notation)
Variables • Variable: named storage for values (such as strings and numbers) • Names preceded by a $ • Sample use: $count = 5; # assignment statement $message = “Hello”; # another assignment print $count; # print the value of a variable
Computation • Fundamental arithmetic operations: • + - * / • Others • ** exponentiation • () grouping • Example (try this out as a Perl script) $x = 4; $y = 2; $z = (3 + $x) ** $y; print $z, “\n”;
Interpolation • Given the following script: $x = “Smith”; print “Good morning, Mr. $x”; print ‘Good morning, Mr. $x’; • Strings quoted with “” perform expansions on variables • escape characters like \n are also interpreted when strings are quoted with “” but not when they are quoted with ‘’
Input and Output • Output • print function • Escape characters • Interpolation • Input • Bracket operator (e.g., $line = <>; ) • Not typed (takes in strings or numbers)
Input Files • Opening a file • open INFILE, ’data.txt’; • Input • $line = <INFILE>; • Closing a file • close INFILE;
Output Files • Opening • open OUTFILE, ’>result.txt’; • Or, open OUTFILE, ’>>result.txt’; #append • Input • print OUTFILE “Hello”; • Closing files • close OUTFILE;
Conditions • Can execute statements conditionally • Syntax: Example:if ( condition ) if ( $num > 1000 ) { { statement print “Large”; statement } … }
If - Else $num = <>; if ( $num > 1000 ) { print “Large number\n”; } else { print “Small number\n”; } print “Thanks\n”;
Loops • Repetitive execution • Syntax: Example:while ( condition ) $count = 0; { while ( $count < 10 ) statement { statement print “counting-”, $count; … $count = $count + 1; } }
Conditions • ( expr symbol expr ) • Numbers== equal <= less than or equal != not equal >= greater than or equal< less than> greater than • Stringseq ne lt gt le ge=~ pattern match
Functions • length $str returns number of characters in $str • defined $str tests if $str is a valid string (useful for testing if $line=<>; suceeded) • chomp $str removes last character from $str (useful because $line=<>; includes the newline character) • print $var displays $var on output device
Pattern Matching • <string> =~ <pattern>is a condition that that checks if a string matches a pattern • Simplest case: <pattern> specifies a search substring Example: if (s =~ /bio/) …holds TRUE if s is “molecular biology”, “bioinformatics”, “the bionic man”;FALSE if s is “chemistry”, “bicycle”, “a BiOpsy”
Special pattern matching characters • \w letters (word character) • \d digit • \s space character (space, tab \n) • if ( s =~ /\w\w\s\d\d\d/ ) …holds TRUE for “CS 123 course”,“Take Ma 101 today”FALSE for “Only 1 number here”
Special pattern matching characters • . any character • ^ beginning of string/line • $ end of string or line • if ( s =~ /^\d\d\d\ss..r/ ) …holds TRUE for “300 spartans”FALSE for “all 100 stars”
Groups and Quantifiers • [xyz] character set • | alternatives • * zero or more • + 1 or more • ? 0 or 1 • {M} exactly M • {M,N} between M and N characters
NCBI file Example /VERSION\s+(\S+)\s+GI:(\S+)/ • Matches a version line • Parenthesis groups characters for future retrieval • $1 stands for the first version number,$2 gets the number after “GI:”