250 likes | 396 Views
Hashes. Hashes. cover hashes before 2-D arrays because 2-D arrays require a new concept (references) hash (aka associative array) a data structure similar to an array can hold any number of values like an array array values are indexed by integers/ numbers
E N D
Hashes • cover hashes before 2-D arrays because 2-D arrays require a new concept (references) • hash (aka associative array) • a data structure similar to an array • can hold any number of values like an array • array values are indexed by integers/ numbers • hash values are indexed by name or key • the indices are not numbers, they are arbitrary unique strings • keys must always be unique, but values may be duplicated
Array VS Hash array hash values keys values indices 0 1 2 3 4 5 "two" "hi" "2" "3.5" "Fred" "betty" 35 35 12.4 12.4 "hello, world" "hello, world" 1.72e30 1.72e30 "bye\n" "bye\n" 69 69 This is misleading because there is no real order to a hash – there if no first element, and no last element.
Hash A hash is more like a "bucket" of data, where each value has a tag attached. The tag is the key. Retrieving a specific item requires knowing the key. You can also reach into the barrel, pull out any key, and see what value is attached (or all the keys, or all the values) 1.72e30 "3.5" "hello, world" "2" "Fred" "bye\n" 69 "betty" 35 "hi" "two" 12.4
Hash VS array • arrays are linear and ordered (0,1,2,3, etc.) • searching an array for a value takes time proportional to the size of the array • hashes map a key to a value • a hash item may be accessed in constant time • generally, hashes are used when one set of data has a natural relationship to another set of data
Example Hash Datasets • Last name, first name (Braun maps to Terry) • Name, address (Terry Braun maps to 814 20th Ave) • Host name, IP address (texas.eng.uiowa.edu maps to 128.255.22.207) • word, word count (the, 42) • user name, disk usage (tabraun, 27.2 gig) • codon, amino acid (ATG, M) • CCA=>P, CCG=>P, CCC=>P, CCT=>P • P=> ??? • Subjects, genotypes • Genes, expression values
Hash Syntax $hash{$some_key} VS $array[$index] Note: Larry Wall says that curly braces are used for hashes because something fancier than array indexing is being performed
Accessing Hash Elements $name{"fred"} = "flintstone"; $name{"barney"} = "rubble"; $name{"dino"} = "flintstone"; @first = qw/ fred barney dino/; foreach $first_name (@first) { print "I've met $first_name $name{$first_name}\n"; } .
Scalars, Arrays, Hashes, and Subroutines, oh my! $person = "terry"; #scalar $person = &person; #subroutine call $person[1] = "bob"; #array $person{$person} = "braun"; #hash # note that the "key" can be any expression # and not just a string literal $person[0] = “braun”; $person{$person} = $person[0]; $test = $person{"jill"}; #accessing outside the hash returns an UNDEF. .
Hash Variable - % %hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n"); - key-value pairs @any_array = %some_hash; # This "unwinds" the key-value pairs into an array – the keys and values are placed into the array #Note that the order may be different than the order in which they were placed into the hash
More Hashing %inverse_hash = reverse %any_hash; A hash has no particular ordering. However, in this example, the reverse command unwinds the hash into an array context, of key-value pairs (key, value, key, value, …) Then reverse turns the list end-for-end, so the list is now (value, key, value, key, …) This is then stored in %inverse_hash – the list is "re-hashitized" -- so now values are the keys. This only works if the values are unique, or the "last one wins," meaning the mapping of P=>CCC and P=>CCT depends on which one was processed last.
Another Way to Construct a Hash: => my %name = ( "fred" => "finstone", "dino" => undef, "Barney" => "rubble", "betty" => "rubble", ); #extra comma harmless # easier to read if you put each key-value on its own line .
Hash Table Example -- another way %aminos = ( "TTT", "F", #key first "TTC", "F", "TTA", "L", "TTG", "L", "CTT", "L", "CTC", "L", "CTA", "L", "CTG", "L", "ATT", "I", "ATC", "I", "ATA", "I", "ATG", "M", "GTT", "V", "GTC", "V", "GTA", "V", "GTG", "V", "TCT", "S", "TCC", "S", "TCA", "S", "TCG", "S“)
Hash Functions keys HASH -- returns all keys of a hash values HASH -- returns all values of a hash %my_hash = (a => 1, b=> 2, c=>33); @keys = keys %my_hash; @values = values %my_hash; print "@keys @values\n"; #c a b 33 1 2 Note that the order of the keys will correspond to the order of the values This remains true unless the hash is manipulated (add or remove elements) Perl may re-order the hash for efficiency.
Hash Functions Scalar context – keys returns the number of key-value pairs $count = keys %hash; exists $HASH{KEY} -- returns true if the HASH contains the given KEY if (exists $aminos{CCC}) { print "The amino acid for codon CCC is $aminos{CCC}\n"; } # CCC->P
Hash Functions delete $HASH{KEY} -- remove a key-value pair from the hash $codon = "CCC"; $aminos{$codon} = "P"; delete $aminos{$codon}; each %HASH -- returns a key-value pair as a two-element list while ( ( $key, $value) = each %hash) { print "$key => $value\n"; }
Using foreach with a Hash foreach $key (sort keys %hash) { $value = $hash{$key}; print ”$key => $value\n"; } #same as foreach $key (sort keys %hash) { print "$key => $hash{$key}\n"; }
#same as @keys = keys %hash; @keys = sort @keys; foreach $key (@keys) { print "$key => $hash{$key}\n"; }
Hash Element Interpolation or lack thereof $books{"tom"}=5; $books{"bob"}=32; foreach $person (sort keys %books) { if($books{$person}) { #hash of book count print "$person has $books{$person} items\n"; #bob has 3 items } } print "The hash is %books\n"; #this does NOT interpolate # The hash is %books
Example • Generate random numbers • Use hash to count numbers (to validate ‘randomness’) • Print count for each number • Print average
#!/usr/bin/perl # countLoopSrand.pl # srand(time() ^ ($$ + ($$ << 15)) ); print "Enter sequence length (nucleotides): "; $length = <STDIN> ; #length of sequence to generate -- from screen if(!($length =~ m/^\d+/)) { #match only an integer at the beginning die ("Invalid input: $&\n"); } while($length) { # stay in loop until have generated enough sequence $rand = int rand(4); # Interger number between (0-3) inclusive $length--; #decrease loop counter print "$rand "; } RUN EXAMPLE
#!/usr/bin/perl # countLoopSrand-hash.pl # srand(time() ^ ($$ + ($$ << 15)) ); print "Enter sequence length (nucleotides): "; $length = <STDIN> ; #length of sequence to generate -- from screen if(!($length =~ m/^\d+/)) { #match only an integer at the beginning die ("Invalid input: $&\n"); } while($length) { # stay in loop until have generated enough sequence $rand = int rand(4); # Interger number between (0-3) inclusive $length--; #decrease loop counter $count{$rand}=$count{$rand}+1; #a Hash to count random numbers print "$rand "; } @keys = keys %count; print "\n"; foreach $key (@keys) { print "count:$key $count{$key}\n"; $total = $total + $count{$key}; } print "Average = ".$total/(keys %count)."\n"; RUN EXAMPLE