160 likes | 285 Views
P-Tree Implementation. Anne Denton. So far: Logical Definition. C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of structure open Wide variety of implementations has been tried. Tree Representation Options. Pointers Tree-walks Depth-first
E N D
P-Tree Implementation Anne Denton
So far: Logical Definition • C.f. Dr. Perrizo’s slides • Logical definition • Defines node information • Representation of structure open • Wide variety of implementations has been tried
Tree Representation Options • Pointers • Tree-walks • Depth-first • Breadth-first • Node addresses (P-trees: qids) • Note: Any one tree representation will make the tree loss-less!
Issues • Storage requirements • Suitability to distributed processing • (e.g., avoiding pointer swizzling) • Ease of access to particular nodes Main issue • Data structure must optimize anding speed at each node
Main Desired Property Anding through Bit-vector operations • New node information • New structural information Why? • Parallelism: up-to 32 or 64 bits processed in parallel for single processor CPU
QID-based P-Vector representation (Example: P1V) [ ] 1001 [01] 0010 [10] 1101 [01.00] 1110 [01.11] 0010 [10.10] 1101 • Node information stored as bit-vector • Structural information: • Traditional relation of degree 2 • Address is key
Can We Convert Address to Bit-Vectors? [ ] 1001 [ ] 0110 [01] 0010 [01] 1001 [10] 1101 <=> [10] 0010 [01.00] 1110 [01.11] 0010 [10.10] 1101 • We know this: PMV! • Claim: qid is now redundant • Standard conversion to bit-vectors
Does this Define Structure? • Yes! • Concept: • Similar to Depth-First Search • Mixed vector specifies existing children • Slight modification: • Store all children to one node sequentially • Reason: address can be computed through counts on mixed
P-Tree Anding • Start at root • Pursue new (potentially) mixed children • Deriving new mixed (m) and pure1 (u): • u is AND of all ui • m is AND of all (mi OR ui) AND NOT u • Cannot be done with either u or m alone
Fast Counting using Table Look-up • How many bits are set in 01100110? • Look-up table stores “4” for index 102 • Works up-to sequences of 8 bit 00000000 0 00000001 1 00000010 1 00000011 2 00000100 1 00000101 2
Finding the next 1 • Which is the first bits set in 01100110? • Look-up table stores “1” for index 102 • Works up-to sequences of 8 bit (00000000 8) 00000001 7 00000010 6 00000011 6 00000100 5 00000101 5
Finding a child • Assume children are stored in sequence • For mixed vector 01100110 where is the child with index 5 (part of qid)? • Count the children in 01100 • Storage location calculated with one table look-up
Potential problems • Eliminating large sub-trees slow • Speeding up “and”: • Introduce additional access structure • Array indices as pointers • Note: • No lowest level due to adjacent storage of children • Reduces storage by about 1/fanout (e.g., 1/16) • Access structure does not need to be stored (P-tree loss-less without it)
Summary • PV1: node values stored as bit-vectors • Now: tree structure stored as bit-vectors as well • Benefits: Several fast bit-vector algorithms can be used • Description of structure: • Modified depth-first tree-walk • Additional access structure efficient