1 / 15

Programming and Perl for Bioinformatics Part III

Programming and Perl for Bioinformatics Part III. Basic Data Types. Perl has three basic data types : scalar array (list) associative array (hash). Associative Arrays/Hashes. List of scalar values (like array) Elements referred to by key , not index number

Download Presentation

Programming and Perl for Bioinformatics Part III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming and Perlfor BioinformaticsPart III

  2. Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)

  3. Associative Arrays/Hashes • List of scalar values (like array) • Elements referred to by key, not index number • Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'};# “ALA” print $threeletter{'L'};? • exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};

  4. Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); • keys returns a list of all keys • values returns a list of all values • each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; • Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?

  5. Associative Arrays • Some common functions: • keys(%hash) #returns a list of all the keys • values(%hash) #returns a list of all the values • each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array • delete($hash{[key]}) #remove the pair associated #with key

  6. More on Perl • Subroutines and Functions • A way to organize a program • Wrap up a block of code • Have a name • Provide a way to pass values to the block and report back the results • Regular expression

  7. Basics about Subroutines • # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_; # @_ is special variable containing args print "Please enter something: "; } • # function call myblock($arg1, $arg2, …, $argN); • Example sub add8A { my ($rna) = @_; $rna .= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU"; $longer_rna = add8A($rna); print "I added 8 As to $rna to get $longer_rna.\n";

  8. More example sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) { ($A,$B) = split /\s/, $pairs; @strands = (@strands, $A, $B); } return @strands; } #templates are in the form "A B". Ex. “ACGT TGCA” @Denatured = denaturizing(@PCRproducts);

  9. Variables Scope • A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; changeA(); print "$a\n"; • The value of $a is printed three times. Can you guess what values are printed? • $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; changeA(); print "$a\n";

  10. Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) = @_; $dna = $input; $dna =~ s/A/T/g; return $dna; } Output?

  11. Regular Expressions • Regular Expressions: Language for specifying text strings • Regular Expressions is a mechanism for specifying character patterns • Useful for • Finding files by name • Finding text in a file • Finding (or not finding) interesting text in a string • Text based search and replace • Finding and extracting text

  12. Pattern Finding Problem: find an ORF in nucleotide sequence • Look for start (ATG) and stop codons (TAA, TAG, TGA) • Pattern search operator: m// or // • $string =~ /<pattern>/returns true if the pattern matches somewhere in $string, false otherwise • Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }

  13. *+ Stephen Cole Kleene Regular Expressions • Optional characters ? ,* and + • /colou?r/  colororcolour • ? (0 or 1) • /oo*h!/ oh!orooh!orooooh! • * (0 or more) • /o+h!/ oh!orooh!orooooh! • + (1 or more) • Wild cards . • /beg.n/  beginorbeganorbegun

  14. Common Regular Expressions White-space characters \t (tab), \n (newline), \r (return) \s : match a whitespace character x : character 'x' . : any character except newline ^r : match at beginning of line r$ : match at end of line r|s : match either or (r) : group characters (to be saved in $1, $2, etc) [xyz] : character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* : zero or more r's, where r is any regular expression r+ : one or more r's r? : zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)

  15. Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?

More Related