Sorting

Sorting

Simple Sorting • As you are probably aware, there are many different sorting algorithms: selection sort, insertion sort, bubble sort, heap sort, quick sort, etc. You can spend a lot of time writing sort functions if you are so inclined. However, most of us have better things to do, and so Perl provides a good general purpose “sort” function, that uses the standard quick sort algorithm. In most cases it will be quite efficient and fast enough for your purposes. • By default, the Perl sort function sorts by ASCII order. Thus, the command my @sorted_array = sort @array; puts the elements of @array into ASCII-sorted order in @sorted_array.

Sort Function • The way in which the “sort” determines ordering can be altered by putting a user-defined subroutine or a block of code between the word “sort” and the list (or array) of things to be sorted: sort user_sub @arr; sort {code block} @arr; • The subroutine or block goes through the array and compares each pair of elements in the array. It returns a negative number if the first element is less than the second, zero if the two elements are equal, and a positive number if the first element is greater than the second element. • The input to the subroutine or block is fixed: two variables, called $a and $b. Note that these names are fixed: $a is the first element to be compared, and $b is the second element. $a and $b should not be modified by the subroutine. • A simple comparison bloc for a numerical sort: sort { $a - $b } @arr; If $a (first element) is greater that $b, a positive number is returned; 0 is returned if they are equal, and a negative number is returned if $b is greater than $a.

Built-in Comparison Operators • Perl includes 2 built-in operators that give the correct results for sorting numbers or strings. • For numerical comparisons, <=> (sometimes called the “spaceship” operator) is used. The standard numerical sort is written: sort {$a <=> $b} @arr; • For string comparisons, cmp is used. For an ASCII sort you can use the following (although it is the default and not necessary to write is a simple sort): sort {$a cmp $b } @arr;

Blocks vs. Subroutines • The sort routine can be a block of code, enclosed by curly braces {}, or it can be a subroutine defined elsewhere. Which you use is a matter of taste. If the routine is long and complex, a subroutine might be more appropriate, but for simple and short methods, a block is easier. • Thus, these two are equivalent: sort {$a <=> $b } @arr; sort numerically @arr; sub numerically { return $a <=> $b; }

Reverse Sort • By default, sorting is done from smallest to largest. The simplest way to sort from largest to smallest is to simply reverse $b and $a in the sort routine block: numerical: sort {$b <=> $a} @arr; ASCII: sort {$b cmp $a} @arr;

Sorting Hash Keys • A common problem is to sort the keys of a hash by the values they refer to, for instance to print them in proper order. • The trick is to put the keys into an array, then use those keys to access the hash’s values. For example: my @keys = keys %hash; my @sorted_keys = sort {$hash{$a} <=> $hash{$b} } @keys; • Here, each pair of hash keys is taken from @keys and substituted into the sort routine, and a sorted list of keys is outputted. • it is necessary to put the sorted keys into an array because hash keys are not stored in a fixed order. • The original @keys array isn’t necessary. This will also work: my @sorted_keys = sort {$hash{$a} <=> $hash{$b} } keys %hash;

Sorting Array Indices • If you have several parallel arrays that you wish to sort simultaneously, you need to create a list of array indices in the sorted order. The procedure is very similar to sorting hash keys. my @sorted_ind = sort { $arr[$a] <=> $arr[$b] } 0 .. $#arr; • Here, the indices are accessed through 0 .. $#arr. They are then used to compare the actual array elements, and then put into the “@sorted_ind” array. You can then use this array of indices to sort any other arrays in the same order: my @sorted_arr = @arr[@sorted_ind]; my @second_sorted_arr = @second_arr[@sorted_ind]; • This procedure uses an “array slice”, a list of indices inside the square braces instead of a single index value.

Example • You have 3 arrays of student data: @id = (57880, 74675, 13892, 20051, 18834); @names = qw(Ahmed Anderson Blackwell Chilson Coburn); @grades = qw(A C B F B); • You want to sort all of them in parallel, by ID number. @sorted_ind = sort {$id[$a] <=> $id[$b]} 0 .. $#id; # @sorted_ind is (2, 4, 3, 0, 1; @id = @id[@sorted_ind]; @names = @names[@sorted_ind]; @grades = @grades[@sorted_ind];

Sorting by Two or More Criteria • You want to sort first by one criterion, then resolve ambiguities using a second criterion. For example, sort by last name, then by first name if necessary. • To do this, use the “or” operator || between the two comparisons. If the first comparison returns 0 (because they are equal), then do the second comparison. The second is not done if the first comparison returns a TRUE value (i.e. non-zero). @last_names = qw(Coburn Smith Jones Jones Smith); @first_names = qw(Fred Harold Mary Jane Hortense); @sorted_ind = sort {$last_names[$a] cmp $last_names[$b] || $first_names[$a] cmp $first_names[$b] } 0 .. $#last_names; • Then use the sorted indices to put both arrays in the proper order.

Grep • grep is a useful function taken from Unix. It takes an array as an argument, tests each array element, and returns a list of those elements for which the test is true. • grep puts each element of the array into $_ for the test. This is similar to a “foreach” loop. • A typical example: return all words in a list containing “th”: @arr = qw(the dog went in there); @th_arr = grep /th/, @arr; # @th_arr is (“the”, “there”) • This is equivalent (but much shorter than): foreach (@arr) { push @th_arr if ($_ =~ /th/); } • You can use either an expression or a block as the test. Note that you must put a comma after an expression (before the list of elements), but NOT after a block enclosed by curly braces. Thus, this is equivalent to the above: @th_arr = grep { /th/ } @arr; • The expression or block must return either a true value or a false value when it is evaluated with $_.

Grep Used to Determine Membership in a List • A good use for grep is determining whether a string or number is present in a list. • in a scalar context, grep returns the number of times the test expression is true. • In a list context, grep returns a list of the array items for which the test expression is true. my @pets = qw(cat dog ferret gerbil rabbit); my $animal = <STDIN>; if (grep $_ = $aminal, @pets) { print “I have a $animal\n”; } else { print “I don’t have a $animal\n”; } • grep is being used in a list context: if $animal matches anything on the list @pets, grep will return a value of 1 or more. If nothing matches, grep returns 0 (false).

Map • map is another useful function related to grep. map takes each element of a list or array, substitutes them in turn into $_, and performs some function on them. • As with grep, map is simply a shorter way of writing something you could do with a foreach loop. • A simple example: adding 3 to each element of an array: @arr = map {$_ += 3} @arr; • Just like grep, map can use either an expression (followed by a comma), or a block (no comma). • map can return more than one value for each input value. All returned values end up on a single list, however. • For example, this function returns a 1, followed by the original value, followed by the square of that value. This function has a real use in multivariate statistics, by the way. @new_arr = map { 1, $_, $_ ** 2} @arr;

Sorting

Sorting

Presentation Transcript

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting