Perl IV

Perl IV Part V: Hashing Out the Genetic Code, Bioperl

Hashes • There are three main data types in Perl – scalar variables, arrays, and hashes. • Hashes provide VERY fast nested-array look-up • Format is similar to that of array: • %hash = (‘key’ => ‘value’); • $value = $hash{‘key’};

Hashes %array = ( ‘key1’, ‘value1’, ‘key2’, ‘value2’, ‘key3’, ‘value3’, ); %array = ( ‘key1’=> ‘value1’, ‘key2’=> ‘value2’, ‘key3’=> ‘value3’, );

Hashes • @keys = keys %hash • @values = values %hash

The Binary Search of Arrays • The ‘halving’ method is considerably faster then doing a comparison. e.g. finding a match in a 30000 set array takes 15 times through a loop max. • Good method for one sort and multiple comparisons

Comparing Strings • To compare 2 strings alphabetically in Perl, you use the cmp operator, which returns 0 if the two strings are the same, -1 if they are in alphabetical order, and 1 if they are in reverse order. • ‘zzz’ cmp ‘zzz’ returns 0 • ‘AAA’ cmp ‘ZZZ’ returns -1 • ‘ZZZ’ cmp ‘AAA’ returns 1

Sorting Arrays • Sorting an array of strings alphabetically • @array = sort @array; • if given numbers this will sort them lexically • Sorting an array of numbers in ascending order • @array = sort { $a <=> $b } @array; • the values $a and $b must be used

Sorting Hashes • Sorting keys and values • foreach ( sort keys (%hash)) { • print “$_\t”, “*” x $hash{$_},”\n”; } • Sorting keys in ascending order • foreach (sort {$hash{$b}<=>$hash{$_}} keys (%hash)) { …… }

Nested Arrays print $array will give ARRAY(0x85d3ad0) but print $array[$i] gives array of $j • $array[$i] -> [$j]; • produces $array[$i][$j] • Or use hashes: • %hash = (duck => [‘Huey’,’Louie’,’Dewey’], horse => [‘Mr. Ed’], dog => [‘Benji’, ‘Lassie’] ); $value = $hash{$key}[$i]

The Genetic Code is Redundant

Searching for codons DIFFICULT: my($codon) = @_; return s if ($codon =~ /TCA/i ); return s elseif ($codon =~ /TCC/i); return s elseif ($codon =~ /TCG/i); blah blah blah blah

Searching for codons BETTER: my($codon) = @_; return A if ($codon =~ /GC./i ); return C elseif ($codon =~ /TG[TC]/i); return D elseif ($codon =~ /GA[TC]/i); blah blah blah blah

Searching for codons BEST: my($codon) = @_; $codon uc $codon; my(%genetic_code) = ( ‘TCA’ => ‘S’, ‘TCC’ => ‘S’, ‘TCG’ => ‘S’ …. yadda yadda yadda ); return $genetic_code{$codon} if (exists $genetic_code{$codon})

Modules • Perl contains the ability to deal with methods in an object-orientated manner • classes are contained in packages • These are often referred to as modules • OO structure is: • objectName ->method(arguments) Note to Self --- how many objects?.........

BioPerl (www.bioperl.org) • The main focus of Bioperl modules is to perform sequence manipulation, provide access to various biology databases (both local and web-based), and parse the output of various programs. • Its modules rely heavily on additional Perl modules available from CPAN (www.cpan.org)

How to go about comparing an unknown sequence . . . $in = Bio::SeqIO->new(‘file’=>$infile, ‘-format’=>’genbank’); $seqobj = $in->next_seq(); @allfeatures = $seqobj->all_SeqFeatures(); $feat = $allfeatures[0]; $feature_start = $feat->start; $feature_strand = $feat->strand; If ($seqobj->species->{common_name} =~ {elegans}) { $seq = $seqobj->primary_seq->{seq} $id = $seqobj->id; }

Perl IV

Perl IV

Presentation Transcript

Perl

Programming and Perl for Bioinformatics Part IV

Perl

PERL

Perl

Perl

Perl

PERL

Perl

Perl

PERL

Perl

Perl

Perl

Perl

Perl

Perl