1 / 35

Programming and Perl for Bioinformatics Part I

Programming and Perl for Bioinformatics Part I. Why Write Programs?. Automate computer work that you do by hand - save time and reduce errors Run the same analysis on lots of similar data files = scale-up Analyze data, make decisions sort Blast results by e-value and/or species of best mach

maitland
Download Presentation

Programming and Perl for Bioinformatics Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming and Perlfor BioinformaticsPart I

  2. Why Write Programs? • Automate computer work that you do by hand - save time and reduce errors • Run the same analysis on lots of similar data files = scale-up • Analyze data, make decisions • sort Blast results by e-value and/or species of best mach • Build a pipeline • Create new analysis methods

  3. Why Perl? • Fairly easy to learn the basics • Many powerful functions for working with text: search & extract, modify, combine • Can control other programs • Free and available for all operating systems • Most popular language in bioinformatics • Many pre-built “modules” are available that do useful things

  4. Get Perl • You can install Perl on any type of computer • Just log in - you don’t even need to type any command to make Perl active. • Download and install Perl on your own computer: www.perl.org

  5. Programming Concepts • Program= a text file that contains instructions for the computer to follow • ProgrammingLanguage = a set of commands that the computer understands (via a “command interpreter”) • Input= data that is given to the program • Output = something that is produced by the program

  6. Programming • Write the program (with a text editor) • Run the program • Look at the output • Correct the errors (debugging) • Repeat computers are VERY dumb -they do exactly what you tell them to do, so be careful what you ask for…

  7. Strings • Text is handled in Perl as a string • This basically means that you have to put quotes around any piece of text that is not an actual Perl instruction. • Perl has two kinds of quotes - single ‘...’ and double “...” (they are different- more about this later)

  8. Print • Perl uses the term “print” to create output • Without a printstatement, you won’t know what your program has done • You need to tell Perl to put a carriage return at the end of a printed line • Use the “\n” (newline) command • Include the quotes • The “\” character is called an escape - Perl uses it a lot

  9. A Taste of Perl: print a message • hello_world.pl: Greet the entire world. #!/usr/bin/perl -w #greet the entire world $x = 6e9; print “Hello world!\n”; print “All $x of you!\n”; - command interpretation header - a comment - variable assignment statement } - function calls (output statements)

  10. Variables • Up till now, we’ve been telling the computer exactly what to print. But in order for the program to generate what is printed, we need to use variables. • A scalar variable name starts with “$” • It can store either a string or a number.

  11. Basic Syntax and Data Types • whitespace doesn’t matter to Perl. One can write all statements on one line • All Perl statements end in a semicolon “;” just like C • Comments begin with ‘#’ and Perl ignores everything after the # until end of line. • Example: #this is a comment • Perl has three basic data types: • scalar • array (list) • associative array (hash)

  12. Variables • To be useful at all, a program needs to be able to store information from one line to the next • Perl stores information in variables • A scalar variable name starts with the “$” symbol, and it can store strings or numbers • Variables are case sensitive • Give them sensible names • Use the “=”sign to assign values to variables $one_hundred = 100 $my_sequence = “ttattagcc”

  13. Scalars • Scalar variables begin with $ followed by an identifier • Example: $this_is_a_scalar; • An identifier is composed of upper or lower case letters, numbers, and underscore '_'. Identifiers are case sensitive (like all of Perl) • $progname = “first_perl”; • $numOfStudents = 4; • = ( “gets”) sets the content of $progname to be the string “first_perl” and $numOfStudents to be the integer 4

  14. Scalar Values • Numerical Values • integer: 5, “3”, 0, -307 • floating point: 6.2e9, -4022.33 NOTE: all numerical values stored as floating-point numbers (“double” precision)

  15. A program with variables #!/usr/bin/perl -w #this program uses variables containing numbers my $two = 2; my $three = $two + 1; print “\$two * \$three = $two * $three = ", ($two * $three); print "\n";

  16. Do the Math • Mathematical functions work pretty much as you would expect: 4+7 6*4 43-27 256/12 2/(3-5) • Example #!/usr/bin/perl print "4+5\n"; print 4+5 , "\n"; print "4+5=" , 4+5 , "\n"; $myNumber = 88; • Note: use commas to separate multiple items in a print statement 4+5 9 4+5=9 What will be the output?

  17. Scalar Values • String values • Example: $day = "Monday "; print "Happy Monday!\n"; print "Happy $day!\n"; print 'Happy Monday!\n'; print 'Happy $day!\n'; • Double-quoted: interpolates (replaces variable name/control character with it’s value) • Single-quoted: NO interpolation done (as-is) Happy Monday!<newline> Happy Monday!<newline> Happy Monday!\n Happy $day!\n What will be the output?

  18. 2 Length of the substring 0 String Manipulation Concatenation $dna1 = “ACTGCGTAGC”; $dna2 = “CTTGCTAT”; • juxtapose in a string assignment or print statement $new_dna = “$dna1$dna2”; • Use the concatenation operator ‘.’ $new_dna = $dna1 . $dna2; Substring $dna = “ACTGCGTAGC”; $exon1 = substr($dna,2,5); # TGCGT

  19. Substitution DNA transcription: T  U Substitution operator s/// : $dna = “GATTACATACACTGTTCA”; $rna = $dna; $rna =~s/T/U/g; #“GAUUACAUACACUGUUCA” =~ is a binding operator indicating to exam the contents of $rna for a match pattern; “g” : global Ex: Start with $dna =“gaTtACataCACTgttca”; and do the same as above. What will be the output?

  20. Example • transcribe.pl: $dna ="gaTtACataCACTgttca"; $rna = $dna; $rna =~ s/T/U/g; print "DNA: $dna\n"; print "RNA: $rna\n"; • Does it do what you expect? If not, why not? • Patterns in substitution are case-sensitive! What can we do? • Convert all letters to upper/lower case ( preferred when possible ) • If we want to retain mixed case, use transliteration/translation operatortr/// $rna =~ tr/tT/uU/; #replace all t by u, all T by U

  21. Case conversion $string = “acCGtGcaTGc”; Upper case: $dna = uc($string); # “ACCGTGCATGC” or$dna = uc $string; or$dna = “\U$string”; #\U : string directive Lower case: $dna = lc($string); # “accgtgcatgc” or$dna = “\L$string”; Sentence case: $dna = ucfirst($string) # “Accgtgcatgc” or$dna = “\u\L$string”;

  22. Reverse Complement 5’-A C G T C T A G C . . . . G C A T-3’ 3’-T G C A G A T C G . . . . C G T A-5’ • Reverse: reverses a string $string = "ACGTCTAGC"; $string = reverse($string);"CGATCTGCA” • Complementation: use transliteration operator $string =~ tr/ACGT/TGCA/;

  23. What’s Wrong? • $DNA = "ACGTCTAGC"; print "$DNA\n\n"; $revcom = reverse $DNA; # Next substitute all bases by their complements, # A->T, T->A, G->C, C->G $revcom =~ s/A/T/g; $revcom =~ s/T/A/g; $revcom =~ s/G/C/g; $revcom =~ s/C/G/g; # Print the reverse complement DNA onto the screen print "$revcom\n";

  24. Optional, default 0 More on String Manipulation String length: length( $dna ) Index: #index STR,SUBSTR,POSITION index( $strand, $primer, 2 )

  25. Flow Control Conditional Statements • parts of code executed depending on truth value of a logical statement “truth” (logical) values in Perl: false = {0, 0.0, 0e0, “”, undef}, default “” true = anything else, default 1 ($a, $b) = (75, 83); if ( $a < $b ) { $a = $b; print “Now a = b!\n”; } if ( $a > $b ) { print “Yes, a > b!\n” }# Compact

  26. Comparison Operators

  27. Logical Operators

  28. if/else/elsif • allows for multiple branching/outcomes $a = rand(); if ( $a < 0.25 ) { print “A”; } elsif ($a < 0.50 ) { print “C”; } elsif ( $a < 0.75 ) { print “G”; } else { print “T”; }

  29. What’s a block? • In the case of an “if” statement: • If the test is true, execute all the command lines inside the { }brackets. If not, then go on past the closing } to the statements below. • You can also do stuff in a block over and over again using a loop.

  30. Conditional Loops while ( statement ) { commands … } • repeats commands until statement is no longer true do { commands } while ( statement ); • same as while, except commands executed as least once • NOTE the “ ; ” after the while statement Short-circuiting commands: next and last • next; #jumps to end, do next iteration • last; #jumps out of the loop completely

  31. While-Loop • Loops test a condition and repeat a block of code based on the result • while loops repeat while the condition is true $count = 1; while($count <= 10) { print “$count bottles of pop\n”; $count = $count +1; }; print “POP!\n”; [Try this program yourself]

  32. for and foreach loops • Execute a code loop a specified number of times, or for a specified list of values • for and foreach are identical: use whichever you want Incremental loop (“C style”): for ( $i=0 ; $i < 50 ; $i++ ) { $x = $i*$i; print "$i squared is $x.\n"; } Loop over list (“foreach” loop): foreach $name ( "Billy", "Bob", "Edwina" ) { print "$name is my friend.\n"; }

  33. Standard Input • To make the program do something, we need to input data. • The angle bracket operator (< >) tells Perl to expect input, by default from the keyboard. • Usually this is assigned to a variable print “Please type a number: ”; $num = <STDIN>; print “Your number is $num\n”;

  34. chomp • When data is entered from the keyboard, the program waits for you to type the carriage return key. • But.. the string which is captured includes a newline (carriage return) at its end • You can use the chomp function to remove the newline character: print “Enter your name: ”; $name = <STDIN>; print “Hello $name, happy to meet you!\n”; chomp $name; print “Hello $name, happy to meet you!\n”;

  35. Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)

More Related