220 likes | 326 Views
Perl for Bioinformatics Part 2. Stuart Brown NYU School of Medicine. Sources . Beginning Perl for Bioinformatics James Tisdall, O’Reilly Press, 2000 Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.) Lincoln Stein, Wiley-Interscience, 2001
E N D
Perl for BioinformaticsPart 2 Stuart Brown NYU School of Medicine
Sources • Beginning Perl for Bioinformatics • James Tisdall, O’Reilly Press, 2000 • Using Perl to Facilitate Biological Analysisin Bioinformatics: A Practical Guide (2nd Ed.) • Lincoln Stein, Wiley-Interscience, 2001 • Introduction to Programming and Perl • Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil
Debugging • Hopefully you were lucky enough to have some bugs in your programs from the first Perl exercise. • Test each line as you write • insert extra print statements to check on variables
Perl Debugging Help • Add -w on the first line of your programs: #!usr/local/perl -w • provides ‘warnings’ • Add use strict as the 2nd line of your programs • enforces proper variable names • must initialize variables before using (set to some initialvalue such as 0 or empty)
Variable “Interpolation” • A variable holds a value $value = 6; • When you print the variable, Perl gives the value rather than the name of the variable. print $value; 6 • If you put a variable inside double quotes, Perl substitutes the value (this is called variable interpolation) print “The result is $value\n” The result is 6 • If you use single quotes, the variable name is used (interpolation is not used) print ‘The result is $value\n’ The result is $value\n
Input • A Perl program can take input from the keyboard • The angle bracket operator (<>)takes input • Usually this is assigned to a variable print“Please type a number: ”; $num =<>; print“Your number is $num\n”;
chomp • When data is entered from the keyboard, Perl waits for the Enter key to be typed • But the string which is captured includes a newline (carriage return) at its end • Perl uses the function chomp to remove the newline character: print “Enter your name: ”; $name = <>; print “Hello $name, happy to meet you!\n”; chomp $name; print “Hello $name, happy to meet you!\n”;
Working with Text Files • To do real work, Perl has to read data out of text files and write results into output files • This is done in two steps • First, you must give the file a name within the script - this is known as a filehandle • Use the open command: open FILE1, ‘/u/schmoj01/Seqs/protein1.seq’;
Read From the File • Once the file is open, you can read from it using the <> operator • (put the filehandle between the angle brackets) • Perl reads files one line at a time, each time you input data from the file, the next line is read: open FILE1, ‘/u/prot1.seq’; $line1 = <FILE1>; chomp $line1; $line2 = <FILE1>; …etc
Write to a File • Writing to a file is similar to reading from it • Use the > operator to open a file for writing: open FILE1,‘>/u/prot1.seq’; • This creates a new file with that name, or overwrites an existing file • Use >> to append text to an existing file • print to the file using the filehandle: print FILE1 $data1;
Making Decisons • Useful programs must be able to make some decisions on their own • The if operator is very powerful • It is generally used together with numerical or string comparison operators numerical: ==, !=, >, <, ≥, ≤ strings: eq, ne, gt, lt, ge, le
True/False • Perl relies on the concept of True/False decisions. • Things are true if the math works. • The not operator ! reverses it print “positive number” if! ($a < 0);
Conditional Blocks • An if test can be used to control multiple lines of commands: print “Enter your age: ”; $age = <>; chomp $age; if ($age < 21) { print “You are too young for this kind of work!\n”; die “too young”; } print “You are old enough to know better!\n”; • If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below.
If evaluates some statement in parentheses (must be true or false) • Note: conditional block is indented • Perl doesn’t care about indents, but it makes your code more human readable • dieis a special function - stops your script and prints its message • Often used to test if keyboard input data is valid or if an input file exists.
Else & Elseif • Instead of just letting the script go on if it fails the if test, you can designate a second block of code for the “or else” condition • You can also perform multiple tests using elseif if $A = 10 { print “yadda yadda”; # do some stuff } elseif $A > 10 { print “yowsa yowsa”; # do different stuff } elseif $A < 10 { print “do this other stuff”; } else $A { print “if it ain\’t =, >, or <, then I’m stumped” die “not a number”; }
Loops • OK, we’ve got variables, input & output and decisions. Now we need Loops. • Loops test a condition and repeat a block of code based on the result • while loops repeat while the condition is true $count = 1; while ($count <= 10) { print “$count bottles of pop\n”; $count = $count +1; }; print “POP!\n”; [Try this program yourself]
Read a File: line by line open FILE1, ‘/u/doej01/prot1.seq’; while ($line = <FILE1>){ chomp($line); $my_sequence=$my_sequence.$line; }; close FILE1 • Dumps the whole file into the variable $my_sequence
Arrays • It is awkward to store a large DNA sequence in one variable, or to create many variables for a list of numbers • Perl has a type of variable called an “array” that can store a list of data • multiple lines of a text file • a list of numbers • a list of words • Array variables are referred to with an “@” symbol @numbers = (1,2,45,234,11);
Bioinformatics Uses Arrays • bioinformatics data often comes in the form of arrays • tab delimited lists • multi-line text files • Arrays are handy because the entries are indexed • You can grab the third number directly @numbers = (1, 2, 45, 234, 11); print “$numbers[3]\n”; 234 #Note - the index starts with zero!
Read a File into an Array • Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array open FILE1, ‘/u/doej01/prot1.seq’; @DNA = <FILE1>;
join & substr • join combines the elements of an array into a single scalar variable (a string) $DNA = join('', @DNA); • substrtakes characters out of a string $letter = substr($DNA, $position, 1) spacer (empty here) which array where in the string how many letters to take which string
Exercise • Read a DNA sequence from a text file • Calculate the %GC content • What about non-DNA characters in the file? • carriage returns and blank spaces • N’s or X’s or unexpected letters • Write the output to the screen and to a file • use append so that the file will grow as you run this program on additional sequences