1 / 15

Module “Bit::Vector”

Module “Bit::Vector”. “Bit::Vector - more than the name suggests” Steffen Beyer YAPC::Europe, London, UK, ICA, September 22-24 2000. Agenda. What does it do? Purpose(s) Summary of available methods Characteristics Alternatives Some Applications Questions & Answers, Suggestions.

Download Presentation

Module “Bit::Vector”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module “Bit::Vector” “Bit::Vector - more than the name suggests” Steffen Beyer YAPC::Europe, London, UK, ICA, September 22-24 2000

  2. Agenda • What does it do? • Purpose(s) • Summary of available methods • Characteristics • Alternatives • Some Applications • Questions & Answers, Suggestions

  3. What does it do? The Bit::Vector module implements bit arrays of arbitrary size. Not very sexy, you may think. But actually bit vectors are the base of all computations performed by a computer! Your CPU calls them "processor registers"... By the way, is everybody familiar with two's complement binary representation and arithmetics?

  4. Purpose(s) • Efficient storage and handling of bit arrays • Extend your CPU to any desired number of bits • Efficient set operations • Efficient big integer arithmetic

  5. Summary of available methods (See file "BitVector.txt") • Especially interesting methods: • "Interval_Substitute()" (is to bit vectors what "splice" is to Perl arrays) • "Interval_Scan_...()" (finds contiguous blocks of set bits) • "Chunk_...()" (allows access to packets of bits at a time of chooseable size) • "...Reverse()" (same to bit vectors as Perl's "reverse" for strings)

  6. Characteristics (1/3) • Internally written in C (thus fast) • Relies on CPU's machine word operations for maximum speed • Auto-adapts to size of machine word at runtime • Uses efficient algorithms (mostly "divide-and-conquer"), time complexity of many functions O(1), O(n), O(n ld n) • C library at the core can also be used stand-alone (without Perl) • Free Software (GPL+Artistic), C library also LGPL

  7. Characteristics (2/3) - Efficient Algorithms • Example: Exponentiation (xk) E.g. 2713 (base 10) k = 13 = 27*27*27*27*27*27*27*27*27*27*27*27*27 = 110111101 (base 2) n = int(ld k) = 3 = (110118)1* (110114)1* (110112)0* (110111)1 Worst case: 2n multiplications = O(n) = O(ld k) instead of k - 1 = O(k) – here: only 5 instead of 12 • Example: Conversion to decimal representation Divides bit vector modulo largest power of 10 fitting into a machine word, then uses machine word math operations to break remainder down further • Example: Bit counting (number of set bits)

  8. Characteristics (3/3) • Object-oriented interface, e.g. $vec1->intersection($vec2,$vec3); • Optionally(*) provides overloaded operators • one set of operands for set operations, e.g. $set1 = $set2 & $set3; • one set of operands for big integer math, e.g. $bigsum += $bigint; (*): will be optional in version 6.0 (for improved loading speed of "plain" module), is always loaded now

  9. vec() confusing insufficiently powerful for many applications PDL complicated designed primarily for astronomical data analysis and heavy duty number crunching (written in C, internally) Math::PARI very powerful requires separate C library "PARI" Math::BigInt (is in the Core of Perl 5.6) slow (written entirely in Perl, stores digits in Perl arrays) Math::BigInteger unmaintained, doesn't compile (uses XS and a C library) Alternatives (1/2)

  10. Alternatives (2/2) • Set::Bag - implements multisets • Set::IntSpan - optimized for .newsrc file type sets (also supported by Bit::Vector, but need more memory) • Set::Object - implements sets of arbitrary objects (can be simulated with Bit::Vector using lookup table, set operations will then be faster) • Set::Scalar - similar to Set::Object (?), but also allows recursion (set of sets) • Set::Window - optimized for intervals of integers (needs much less memory than Bit::Vector, but only of limited use since the whole interval is either in or out)

  11. Simulating Set::Object using lookup table • See file "SetObject.pl"

  12. Some Applications • Set::IntRange - sets of integers (universe = some interval) • Math::MatrixBool - useful for graph algorithms (e.g. shortest paths / Kleene's Algorithm) • Slice (multiple document version generator) • Parse table generators for compiler-compilers à la "yacc" (calculating first, follow & lookahead character sets) • Cryptography • Easy manipulation of data (files), any number of bits at a time

  13. Application "Slice" • See • homepage screenshot "Slice.bmp" • file "file.in" • file "Slice.txt" • file "file.html.en.OK" • file "file.html.de.OK" • URL http://www.engelschall.com/sw/slice/

  14. Application "Date::Calc" v5.0 (coming soon) • Stores years in bit vectors (one year = one bit vector, one day = one bit) • Bit is "on" if corresponding day is a holiday • Performs calculations taking holidays into account

  15. Questions & Answers, Suggestions • Please feel free to ask! • Suggestions are welcome.

More Related