100 likes | 215 Views
A Fresh Look at Efficient Perl Sorting. Uri Guttman, Sysarch <uri@sysarch.com> Larry Rosler, Hewlett-Packard Laboratories <lr@hpl.hp.com> Perl Conference 3.0, August, 1999. The Perl Sorting Paradigm. 1. Preprocess the input to extract the sortkeys. 2. Sort the data by comparing the sortkeys.
E N D
A Fresh Look atEfficient Perl Sorting Uri Guttman, Sysarch <uri@sysarch.com> Larry Rosler, Hewlett-Packard Laboratories <lr@hpl.hp.com> Perl Conference 3.0, August, 1999
The Perl Sorting Paradigm • 1. Preprocess the input to extract the sortkeys. • 2. Sort the data by comparing the sortkeys. • 3. Postprocess the output to retrieve the data. • @out = # These may be separate steps. map POSTPROCESS($_) => sort sortsub map PREPROCESS($_) => @in; • @out = sort @in; # The default sort. A Fresh Look at Efficient Perl Sorting
Perl Sorting Techniques • Naive (no pre- or postprocessing) • Sortkeys recomputed on every comparison. • Cached sortkeys; the Orcish Maneuver • Sortkeys cached in hashes. • The Schwartzian Transform • Sortkeys cached in anonymous arrays. • The Packed-Default Sort • Sortkeys and operands packed in strings. A Fresh Look at Efficient Perl Sorting
Schwartzian Transformation (ST) Sort a list of strings according to a dotted-quad IP address. @out = map $_->[0] => sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] || $a->[3] <=> $b->[3] || $a->[4] <=> $b->[4] } map [ $_, /(\d+)\.(\d+)\.(\d+)\.(\d+)/ ] => @in; A Fresh Look at Efficient Perl Sorting
ST with Packed Sortkeys Concatenate the subkeys into a sortable string. @out = map $_->[0] => sort { $a->[1] cmp $b->[1] } map [ $_, pack('C4' => /(\d+)\.(\d+)\.(\d+)\.(\d+)/) ] => @in; A Fresh Look at Efficient Perl Sorting
The Packed-Default Sort Append the operands to the packed sortkeys. @out = map substr($_, 4) => sort map pack('C4' => /(\d+)\.(\d+)\.(\d+)\.(\d+)/) . $_ => @in; A Fresh Look at Efficient Perl Sorting
Selected Benchmarks CPU time (microseconds per line) O(N*logN) comparisons dominate the ST. O(N) preprocessing dominates the P-D. A Fresh Look at Efficient Perl Sorting
Packing the Sortkeys • Strings – fixed or varying lengths; ascending or descending; can be case-insensitive • Integers – chars, shorts, or longs; signed or unsigned; ascending or descending • Floating-point numbers – floats or doubles; ascending or descending • Indexes of strings (to achieve stable sorting) or indexes of arrays or hashes (for retrieval) A Fresh Look at Efficient Perl Sorting
The Sort::Records Module • Combines the packed-default sort technique with automatic subkey extraction using a simple attribute/value syntax. • Sort /etc/passwd by user name.$sort1 = Sort::Records-> new([width => 10, split => [':', 0]]);@pw = $sort1->sort(‘cat /etc/passwd‘); • Sort /etc/passwd by user ID.$sort2 = Sort::Records-> new([type => 'int', split => [':', 2]]);@pw = $sort2->sort(‘cat /etc/passwd‘); A Fresh Look at Efficient Perl Sorting
Conclusions • Packing subkeys into sortable strings speeds up large sorts, using any sorting method. • Appending the operands to the sortkeys makes it possible to use the fast default lexicographic sort comparison. • The module Sort::Records encapsulates the code conveniently. • <URL:http://www.hpl.hp.com/personal/Larry_Rosler/sort/> <URL:http://www.sysarch.com/perl/sort/> A Fresh Look at Efficient Perl Sorting