150 likes | 259 Views
Programming and Perl for Bioinformatics Part III. Basic Data Types. Perl has three basic data types : scalar array (list) associative array (hash). Associative Arrays/Hashes. List of scalar values (like array) Elements referred to by key , not index number
E N D
Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)
Associative Arrays/Hashes • List of scalar values (like array) • Elements referred to by key, not index number • Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'};# “ALA” print $threeletter{'L'};? • exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};
Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); • keys returns a list of all keys • values returns a list of all values • each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; • Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?
Associative Arrays • Some common functions: • keys(%hash) #returns a list of all the keys • values(%hash) #returns a list of all the values • each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array • delete($hash{[key]}) #remove the pair associated #with key
More on Perl • Subroutines and Functions • A way to organize a program • Wrap up a block of code • Have a name • Provide a way to pass values to the block and report back the results • Regular expression
Basics about Subroutines • # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_; # @_ is special variable containing args print "Please enter something: "; } • # function call myblock($arg1, $arg2, …, $argN); • Example sub add8A { my ($rna) = @_; $rna .= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU"; $longer_rna = add8A($rna); print "I added 8 As to $rna to get $longer_rna.\n";
More example sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) { ($A,$B) = split /\s/, $pairs; @strands = (@strands, $A, $B); } return @strands; } #templates are in the form "A B". Ex. “ACGT TGCA” @Denatured = denaturizing(@PCRproducts);
Variables Scope • A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; changeA(); print "$a\n"; • The value of $a is printed three times. Can you guess what values are printed? • $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; changeA(); print "$a\n";
Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) = @_; $dna = $input; $dna =~ s/A/T/g; return $dna; } Output?
Regular Expressions • Regular Expressions: Language for specifying text strings • Regular Expressions is a mechanism for specifying character patterns • Useful for • Finding files by name • Finding text in a file • Finding (or not finding) interesting text in a string • Text based search and replace • Finding and extracting text
Pattern Finding Problem: find an ORF in nucleotide sequence • Look for start (ATG) and stop codons (TAA, TAG, TGA) • Pattern search operator: m// or // • $string =~ /<pattern>/returns true if the pattern matches somewhere in $string, false otherwise • Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }
*+ Stephen Cole Kleene Regular Expressions • Optional characters ? ,* and + • /colou?r/ colororcolour • ? (0 or 1) • /oo*h!/ oh!orooh!orooooh! • * (0 or more) • /o+h!/ oh!orooh!orooooh! • + (1 or more) • Wild cards . • /beg.n/ beginorbeganorbegun
Common Regular Expressions White-space characters \t (tab), \n (newline), \r (return) \s : match a whitespace character x : character 'x' . : any character except newline ^r : match at beginning of line r$ : match at end of line r|s : match either or (r) : group characters (to be saved in $1, $2, etc) [xyz] : character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* : zero or more r's, where r is any regular expression r+ : one or more r's r? : zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)
Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?