1 / 10

A Fresh Look at Efficient Perl Sorting

A Fresh Look at Efficient Perl Sorting. Uri Guttman, Sysarch <uri@sysarch.com> Larry Rosler, Hewlett-Packard Laboratories <lr@hpl.hp.com> Perl Conference 3.0, August, 1999. The Perl Sorting Paradigm. 1. Preprocess the input to extract the sortkeys. 2. Sort the data by comparing the sortkeys.

Download Presentation

A Fresh Look at Efficient Perl Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fresh Look atEfficient Perl Sorting Uri Guttman, Sysarch <uri@sysarch.com> Larry Rosler, Hewlett-Packard Laboratories <lr@hpl.hp.com> Perl Conference 3.0, August, 1999

  2. The Perl Sorting Paradigm • 1. Preprocess the input to extract the sortkeys. • 2. Sort the data by comparing the sortkeys. • 3. Postprocess the output to retrieve the data. • @out = # These may be separate steps. map POSTPROCESS($_) => sort sortsub map PREPROCESS($_) => @in; • @out = sort @in; # The default sort. A Fresh Look at Efficient Perl Sorting

  3. Perl Sorting Techniques • Naive (no pre- or postprocessing) • Sortkeys recomputed on every comparison. • Cached sortkeys; the Orcish Maneuver • Sortkeys cached in hashes. • The Schwartzian Transform • Sortkeys cached in anonymous arrays. • The Packed-Default Sort • Sortkeys and operands packed in strings. A Fresh Look at Efficient Perl Sorting

  4. Schwartzian Transformation (ST) Sort a list of strings according to a dotted-quad IP address. @out = map $_->[0] => sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] || $a->[3] <=> $b->[3] || $a->[4] <=> $b->[4] } map [ $_, /(\d+)\.(\d+)\.(\d+)\.(\d+)/ ] => @in; A Fresh Look at Efficient Perl Sorting

  5. ST with Packed Sortkeys Concatenate the subkeys into a sortable string. @out = map $_->[0] => sort { $a->[1] cmp $b->[1] } map [ $_, pack('C4' => /(\d+)\.(\d+)\.(\d+)\.(\d+)/) ] => @in; A Fresh Look at Efficient Perl Sorting

  6. The Packed-Default Sort Append the operands to the packed sortkeys. @out = map substr($_, 4) => sort map pack('C4' => /(\d+)\.(\d+)\.(\d+)\.(\d+)/) . $_ => @in; A Fresh Look at Efficient Perl Sorting

  7. Selected Benchmarks CPU time (microseconds per line) O(N*logN) comparisons dominate the ST. O(N) preprocessing dominates the P-D. A Fresh Look at Efficient Perl Sorting

  8. Packing the Sortkeys • Strings – fixed or varying lengths; ascending or descending; can be case-insensitive • Integers – chars, shorts, or longs; signed or unsigned; ascending or descending • Floating-point numbers – floats or doubles; ascending or descending • Indexes of strings (to achieve stable sorting) or indexes of arrays or hashes (for retrieval) A Fresh Look at Efficient Perl Sorting

  9. The Sort::Records Module • Combines the packed-default sort technique with automatic subkey extraction using a simple attribute/value syntax. • Sort /etc/passwd by user name.$sort1 = Sort::Records-> new([width => 10, split => [':', 0]]);@pw = $sort1->sort(‘cat /etc/passwd‘); • Sort /etc/passwd by user ID.$sort2 = Sort::Records-> new([type => 'int', split => [':', 2]]);@pw = $sort2->sort(‘cat /etc/passwd‘); A Fresh Look at Efficient Perl Sorting

  10. Conclusions • Packing subkeys into sortable strings speeds up large sorts, using any sorting method. • Appending the operands to the sortkeys makes it possible to use the fast default lexicographic sort comparison. • The module Sort::Records encapsulates the code conveniently. • <URL:http://www.hpl.hp.com/personal/Larry_Rosler/sort/> <URL:http://www.sysarch.com/perl/sort/> A Fresh Look at Efficient Perl Sorting

More Related