190 likes | 275 Views
References and Data Structures. References. Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains the address in memory where the other variable is stored. In Perl, the backslash is used to create a reference: my $var = 5;
E N D
References • Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains the address in memory where the other variable is stored. • In Perl, the backslash is used to create a reference: my $var = 5; my $var_ref = \$var; • To dereference a simple reference, put it inside curly braces with another $ in front of it. Thus, ${$var_ref} is the same as $var, that is, the value “5”. • The curly braces de-reference what is inside them. I like to say “{$var_ref} ‘generates’ the scalar variable” . • In many cases you can leave the curly braces out: $$var_ref works just as well as ${$var_ref}. But, in complicated expressions this can cause havoc due to precedence problems.
More References • This same trick works for arrays and hashes too. my @arr = qw(cow horse pig chicken); my $arr_ref = \@arr; print “Farm animals include @{$arr_ref}\n”; # can leave out {} here my %hash = (“red” => “stop”, “yellow” => “caution”, “green” => “go”); my $hash_ref = \%hash; foreach my $key (keys %{$hash_ref} ) { print “$key means ${$hash_ref}{$key}\n”; } # can leave out {} in the “foreach” line, but probably not on the print line.
Arrow Notation • Perl provides an alternative notation for use with array and hash references. The small arrow (hyphen followed by greater-than: ->) de-references. To access individual array or hash elements, follow the arrow with [] or {}. • For example: my @arr = (1, 3, 5, 7); my $arr_ref = \@arr; for (my $i = 0; $i <= $#{$arr_ref}; $i++) { print “Element $i is $arr_ref->[$i]\n”; } • Similarly, hash keys would be placed inside curly braces to access hash values from a hash reference.
Passing Arrays In and Out of Subroutines • One important use of references is passing arrays, hashes, and very long strings into and out of subroutines. • If you pass in a variable, it gets copied to a new location for use by the subroutine. If this is a very long string, such as the DNA sequence of a chromosome, you will use a large amount of memory. • However, if you pass a reference to that string to the subroutine, the string itself is not copied. • Recall that variables are passed into a subroutine by the @_ array. For example: process($var1, $var2, @arr); sub process { my ($x, $y, @z) = @_; ... } • If you try to pass in 2 arrays, they both end up together in the fist array inside the subroutine. That is, Perl “flattens” multiple arrays into the single @_ array. • The way around the problem of passing multiple arrays in or out of subroutines is to pass in references, which are just scalar variables. process($var1, @arr2, @arr3); # DOESN”T WORK process($var1, \@arr2, \@arr3); # GOOD sub process { my ($x, $arr_ref2, $arr_ref3) = @_;
More on Subroutines • Similarly, arrays are generally returned from subroutines in the form of array references. • Note in this example that the array @arr is created within the subroutine, but returned as a reference. The name “@arr” doesn’t exist outside the subroutine. sub add_to { my @arr; for (my $i = 0; $i < 10; $i++) { $arr[$i] = $i + 2; } return \@arr; }
Multidimensional Arrays • Arrays are one-dimensional: a linear set of elements. • Suppose you want a two dimensional array, to keep track of positions on a grid, for instance. Say, a tic-tac-toe game. • Each row can be represented as a single array: @row1 = qw(X O O); @row2 = qw (O X O); @row3 = qw(X O X); • Since the elements of an array are scalars, you can’t just put the row arrays together in a big array to represent the whole game board. • However, array references are scalars, so the game board could be represented by an array of references to the sub-arrays: @game = (\@row1, \@row2, \@row3);
More on Multidimensional Arrays • To access a row, you need to de-reference it: print “Row 2 is @{$game[1]} \n”; • Note the position of the curly braces which do the de-referencing: they surround $game[1], which is an array reference, \@row2. • To access an individual element, say the first square in row 2: print “ ${$game[1]}[0] \n”; • You see that the index value [0] for the individual element is OUTSIDE the curly braces. The array reference is inside; once they return the array, the $ at the beginning of the expression and the [0] at the end of it access the individual element of that row.
Arrow Notation with Multidimensional Arrays • You could also use arrow notation: print “$game[1]->[0] “; • Here, the arrow causes $game[1] to be dereferenced, at which point you can access the individual element [0]. • Perl, in its helpful fashion, allows you to not use arrows between indices. Thus, this also works: print “$game[1][0]” • In this case, @game is an actual array. If you instead used a reference to an array here: $game_ref = \@game; you would need to use the arrow between the variable name and the first index value: print “$game_ref->[1][0]”; • You can leave the arrows out between the indexes, but not between the initial array reference and the first index.
Anonymous Arrays • We have been creating an array such as @arr = (1, 3, 5, 7), then creating a reference to that array: $arr_ref = \@arr. • It isn’t necessary to do this in 2 steps. If we only want to use the array reference, we can create an anonymous array and create an array reference variable to refer to it. The anonymous array never gets its own name; it is always referred to by its reference. • Recall that to construct an array you put the array values within parentheses: @arr = (1, 3, 5, 7); • The anonymous array constructor is square brackets: []. $arr_ref = [1, 3, 5, 7]; • Using square brackets instead of parentheses generates a reference to an anonymous array, which you assign to a variable. In contrast, the parentheses generate the array itself, which must be given an array designation starting with @.
More Anonymous Arrays • We could create the tic-tac-toe game thus: my @game = ( [ “X”, “O”, “O”], [ “O”, “X”, “O”], [“X”, “O”, “X”] ); • That is, we generate 3 anonymous arrays inside the parentheses that create the top level array @game. • Or, we could generate an anonymous array containing 3 references to other anonymous arrays, and assign the whole mess to an array reference scalar: my $game_ref = [ [ “X”, “O”, “O”], [ “O”, “X”, “O”], [“X”, “O”, “X”] ]; • Here we use nested sets of anonymous array generators (square brackets) to produce the array references we need.
Using Temporary Arrays in a Loop • Another way to create a 2 dimensional array is to create each row as a temporary named array, then convert it to an anonymous array reference and push it onto a larger array. for (my $i = 0; $i <= 3; $i++) { my @temp_arr = ($i, $i*2, $i*$i); push @big_arr, [ @temp_arr ]; } • The @temp_arr gets used repeatedly, but the values put into it are placed in separate locations when it gets converted to an anonymous array with [ @temp_arr ]. • There is a temptation to rewrite the “push” line as: push @big_arr, \@temp_arr; #WRONG • This doesn’t work, because @temp_arr cahnges with every pass through the loop, and \@temp_arr always refers to the same place in memory. In contrast, [ @temp_arr ] copies the values in @temp_arr to a new location with each pass through the loop.
Auto-vivification • You don’t need to pre-declare anything about a multidimensional array. Perl takes care of this by creating all needed structures as soon as they are needed. Thus, you could say something like: my @arr; $arr[5][0][1][4] = 17; • This would cause a 4-dimensional array to come into being, with all values other than the one you specified set to “undef”.
Hash of Arrays • A hash stores a value that is indexed by its key. Sometimes you want to store an array of values indexed by the same key. This can be done using the anonymous array composer to create an array for each individual hash key. • For example, various data about students could be stored in a single hash whose keys are the student ID numbers. my %students = ( “z12345” => [“Schmoe”, “Joe”, “freshman”, “F”], “z67890” => [“Smith”, “Harold”, “sophomore, “C”], “z13579” => [“Vicious”, “Nancy”, “senior”, “A”] ); • To access a student’s info: print “@{$students{z12345} } \n”; • To access an individual piece of information, any of these will work: print “${$students{z12345}}[3] “; print “$students{z12345}->[3] “; print “$students{z12345}[3] “; • Note that $students{z12345} is a reference to an anonymous array.
Anonymous Hashes • The anonymous hash generator is the curly braces {}. When used instead of parentheses, they generate a scalar reference to an anonymous hash. • For example: my %hash = (“green” => “go”, “yellow” => “caution”, “red” => “stop”); my $hash_ref = {“green” => “go”, “yellow” => “caution”, “red” => “stop”}; • Hash references are de-referenced just like array references: print “A red light means $hash_ref->{red} \n”; print “A red light means ${$hash_ref}{red} \n”;
Array of Hashes • The anonymous hash composer can be used to create various data structures. An array that contains a set of hash references is an example. • An example: an array of genes on a chromosome, where the position of the gene in the array corresponds to its relative position on the chromosome. Information about each gene is stored in a hash. • For example, assume that INFILE contains information about genes, one gene per line, in a “key = value” format, with each attribute separated by commas. while (<INFILE>) { my @attributes = split /,/; my %temp_hash; foreach my $pair (@attributes) { my ($key, $value) = split /=/, $pair; $temp_hash{$key} = $value; } push @gene_arr, { %temp_hash}; }
Printing from Array of Hashes • To print an individual element, say the length of gene 1. print “$gene_arr[1]{length} \n”; • To print the whole thing: foreach my $i (0 .. $#gene_arr) { foreach my $key (sort keys %{$gene_arr[$i]} ) { print “$key = $gene_arr[$i]{$key}\n”; } }
Hash of Hashes • Here’s a hash of hashes example, based on the previous example of genes on the chromosome. Here we are using a top level hash whose keys are the gene names. • The input file has the gene name followed by a colon, followed by a comma-separated list of key=value pairs. my %gene_hash; while (<INFILE>) { my ($gene, $rest) = split /\s*:\s*/; my @pairs = split /,/, $rest; my %temp_hash; foreach my $pair (@pairs) { my ($key, $value) = split /=/, $pair; $temp_hash{$key} = $value; } $gene_hash{$gene} = { %temp_hash }; }
Further • All kinds of data structure are possible, with as many levels as you like, mixing arrays and hashes freely. All you have to do is not get yourself confused by your own cleverness. • Also, remember that someone else will probably have to read your code someday, so document the structures and avoid needless complications