1 / 22

Perl for Bioinformatics Part 2

Perl for Bioinformatics Part 2. Stuart Brown NYU School of Medicine. Sources . Beginning Perl for Bioinformatics James Tisdall, O’Reilly Press, 2000 Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.) Lincoln Stein, Wiley-Interscience, 2001

Download Presentation

Perl for Bioinformatics Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl for BioinformaticsPart 2 Stuart Brown NYU School of Medicine

  2. Sources • Beginning Perl for Bioinformatics • James Tisdall, O’Reilly Press, 2000 • Using Perl to Facilitate Biological Analysisin Bioinformatics: A Practical Guide (2nd Ed.) • Lincoln Stein, Wiley-Interscience, 2001 • Introduction to Programming and Perl • Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil

  3. Debugging • Hopefully you were lucky enough to have some bugs in your programs from the first Perl exercise. • Test each line as you write • insert extra print statements to check on variables

  4. Perl Debugging Help • Add -w on the first line of your programs: #!usr/local/perl -w • provides ‘warnings’ • Add use strict as the 2nd line of your programs • enforces proper variable names • must initialize variables before using (set to some initialvalue such as 0 or empty)

  5. Variable “Interpolation” • A variable holds a value $value = 6; • When you print the variable, Perl gives the value rather than the name of the variable. print $value; 6 • If you put a variable inside double quotes, Perl substitutes the value (this is called variable interpolation) print “The result is $value\n” The result is 6 • If you use single quotes, the variable name is used (interpolation is not used) print ‘The result is $value\n’ The result is $value\n

  6. Input • A Perl program can take input from the keyboard • The angle bracket operator (<>)takes input • Usually this is assigned to a variable print“Please type a number: ”; $num =<>; print“Your number is $num\n”;

  7. chomp • When data is entered from the keyboard, Perl waits for the Enter key to be typed • But the string which is captured includes a newline (carriage return) at its end • Perl uses the function chomp to remove the newline character: print “Enter your name: ”; $name = <>; print “Hello $name, happy to meet you!\n”; chomp $name; print “Hello $name, happy to meet you!\n”;

  8. Working with Text Files • To do real work, Perl has to read data out of text files and write results into output files • This is done in two steps • First, you must give the file a name within the script - this is known as a filehandle • Use the open command: open FILE1, ‘/u/schmoj01/Seqs/protein1.seq’;

  9. Read From the File • Once the file is open, you can read from it using the <> operator • (put the filehandle between the angle brackets) • Perl reads files one line at a time, each time you input data from the file, the next line is read: open FILE1, ‘/u/prot1.seq’; $line1 = <FILE1>; chomp $line1; $line2 = <FILE1>; …etc

  10. Write to a File • Writing to a file is similar to reading from it • Use the > operator to open a file for writing: open FILE1,‘>/u/prot1.seq’; • This creates a new file with that name, or overwrites an existing file • Use >> to append text to an existing file • print to the file using the filehandle: print FILE1 $data1;

  11. Making Decisons • Useful programs must be able to make some decisions on their own • The if operator is very powerful • It is generally used together with numerical or string comparison operators numerical: ==, !=, >, <, ≥, ≤ strings: eq, ne, gt, lt, ge, le

  12. True/False • Perl relies on the concept of True/False decisions. • Things are true if the math works. • The not operator ! reverses it print “positive number” if! ($a < 0);

  13. Conditional Blocks • An if test can be used to control multiple lines of commands: print “Enter your age: ”; $age = <>; chomp $age; if ($age < 21) { print “You are too young for this kind of work!\n”; die “too young”; } print “You are old enough to know better!\n”; • If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below.

  14. If evaluates some statement in parentheses (must be true or false) • Note: conditional block is indented • Perl doesn’t care about indents, but it makes your code more human readable • dieis a special function - stops your script and prints its message • Often used to test if keyboard input data is valid or if an input file exists.

  15. Else & Elseif • Instead of just letting the script go on if it fails the if test, you can designate a second block of code for the “or else” condition • You can also perform multiple tests using elseif if $A = 10 { print “yadda yadda”; # do some stuff } elseif $A > 10 { print “yowsa yowsa”; # do different stuff } elseif $A < 10 { print “do this other stuff”; } else $A { print “if it ain\’t =, >, or <, then I’m stumped” die “not a number”; }

  16. Loops • OK, we’ve got variables, input & output and decisions. Now we need Loops. • Loops test a condition and repeat a block of code based on the result • while loops repeat while the condition is true $count = 1; while ($count <= 10) { print “$count bottles of pop\n”; $count = $count +1; }; print “POP!\n”; [Try this program yourself]

  17. Read a File: line by line open FILE1, ‘/u/doej01/prot1.seq’; while ($line = <FILE1>){ chomp($line); $my_sequence=$my_sequence.$line; }; close FILE1 • Dumps the whole file into the variable $my_sequence

  18. Arrays • It is awkward to store a large DNA sequence in one variable, or to create many variables for a list of numbers • Perl has a type of variable called an “array” that can store a list of data • multiple lines of a text file • a list of numbers • a list of words • Array variables are referred to with an “@” symbol @numbers = (1,2,45,234,11);

  19. Bioinformatics Uses Arrays • bioinformatics data often comes in the form of arrays • tab delimited lists • multi-line text files • Arrays are handy because the entries are indexed • You can grab the third number directly @numbers = (1, 2, 45, 234, 11); print “$numbers[3]\n”; 234 #Note - the index starts with zero!

  20. Read a File into an Array • Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array open FILE1, ‘/u/doej01/prot1.seq’; @DNA = <FILE1>;

  21. join & substr • join combines the elements of an array into a single scalar variable (a string) $DNA = join('', @DNA); • substrtakes characters out of a string $letter = substr($DNA, $position, 1) spacer (empty here) which array where in the string how many letters to take which string

  22. Exercise • Read a DNA sequence from a text file • Calculate the %GC content • What about non-DNA characters in the file? • carriage returns and blank spaces • N’s or X’s or unexpected letters • Write the output to the screen and to a file • use append so that the file will grow as you run this program on additional sequences

More Related