150 likes | 363 Views
Module “Bit::Vector”. “Bit::Vector - more than the name suggests” Steffen Beyer YAPC::Europe, London, UK, ICA, September 22-24 2000. Agenda. What does it do? Purpose(s) Summary of available methods Characteristics Alternatives Some Applications Questions & Answers, Suggestions.
E N D
Module “Bit::Vector” “Bit::Vector - more than the name suggests” Steffen Beyer YAPC::Europe, London, UK, ICA, September 22-24 2000
Agenda • What does it do? • Purpose(s) • Summary of available methods • Characteristics • Alternatives • Some Applications • Questions & Answers, Suggestions
What does it do? The Bit::Vector module implements bit arrays of arbitrary size. Not very sexy, you may think. But actually bit vectors are the base of all computations performed by a computer! Your CPU calls them "processor registers"... By the way, is everybody familiar with two's complement binary representation and arithmetics?
Purpose(s) • Efficient storage and handling of bit arrays • Extend your CPU to any desired number of bits • Efficient set operations • Efficient big integer arithmetic
Summary of available methods (See file "BitVector.txt") • Especially interesting methods: • "Interval_Substitute()" (is to bit vectors what "splice" is to Perl arrays) • "Interval_Scan_...()" (finds contiguous blocks of set bits) • "Chunk_...()" (allows access to packets of bits at a time of chooseable size) • "...Reverse()" (same to bit vectors as Perl's "reverse" for strings)
Characteristics (1/3) • Internally written in C (thus fast) • Relies on CPU's machine word operations for maximum speed • Auto-adapts to size of machine word at runtime • Uses efficient algorithms (mostly "divide-and-conquer"), time complexity of many functions O(1), O(n), O(n ld n) • C library at the core can also be used stand-alone (without Perl) • Free Software (GPL+Artistic), C library also LGPL
Characteristics (2/3) - Efficient Algorithms • Example: Exponentiation (xk) E.g. 2713 (base 10) k = 13 = 27*27*27*27*27*27*27*27*27*27*27*27*27 = 110111101 (base 2) n = int(ld k) = 3 = (110118)1* (110114)1* (110112)0* (110111)1 Worst case: 2n multiplications = O(n) = O(ld k) instead of k - 1 = O(k) – here: only 5 instead of 12 • Example: Conversion to decimal representation Divides bit vector modulo largest power of 10 fitting into a machine word, then uses machine word math operations to break remainder down further • Example: Bit counting (number of set bits)
Characteristics (3/3) • Object-oriented interface, e.g. $vec1->intersection($vec2,$vec3); • Optionally(*) provides overloaded operators • one set of operands for set operations, e.g. $set1 = $set2 & $set3; • one set of operands for big integer math, e.g. $bigsum += $bigint; (*): will be optional in version 6.0 (for improved loading speed of "plain" module), is always loaded now
vec() confusing insufficiently powerful for many applications PDL complicated designed primarily for astronomical data analysis and heavy duty number crunching (written in C, internally) Math::PARI very powerful requires separate C library "PARI" Math::BigInt (is in the Core of Perl 5.6) slow (written entirely in Perl, stores digits in Perl arrays) Math::BigInteger unmaintained, doesn't compile (uses XS and a C library) Alternatives (1/2)
Alternatives (2/2) • Set::Bag - implements multisets • Set::IntSpan - optimized for .newsrc file type sets (also supported by Bit::Vector, but need more memory) • Set::Object - implements sets of arbitrary objects (can be simulated with Bit::Vector using lookup table, set operations will then be faster) • Set::Scalar - similar to Set::Object (?), but also allows recursion (set of sets) • Set::Window - optimized for intervals of integers (needs much less memory than Bit::Vector, but only of limited use since the whole interval is either in or out)
Simulating Set::Object using lookup table • See file "SetObject.pl"
Some Applications • Set::IntRange - sets of integers (universe = some interval) • Math::MatrixBool - useful for graph algorithms (e.g. shortest paths / Kleene's Algorithm) • Slice (multiple document version generator) • Parse table generators for compiler-compilers à la "yacc" (calculating first, follow & lookahead character sets) • Cryptography • Easy manipulation of data (files), any number of bits at a time
Application "Slice" • See • homepage screenshot "Slice.bmp" • file "file.in" • file "Slice.txt" • file "file.html.en.OK" • file "file.html.de.OK" • URL http://www.engelschall.com/sw/slice/
Application "Date::Calc" v5.0 (coming soon) • Stores years in bit vectors (one year = one bit vector, one day = one bit) • Bit is "on" if corresponding day is a holiday • Performs calculations taking holidays into account
Questions & Answers, Suggestions • Please feel free to ask! • Suggestions are welcome.