1 / 34

P-tree Structure and Algebra: Lossless Representation and Data Mining-ready Structure

This paper explores the P-tree structure, its variations, and algebra in the context of spatial data representation. P-tree technology is patented by NDSU, and the paper discusses its properties and performance. The P-tree is a Peano Count Tree that provides a lossless and compressed representation of spatial data, making it suitable for data mining applications. The paper also covers various spatial data formats and introduces the bSQ format, which facilitates the creation of an efficient P-tree structure.

christoperk
Download Presentation

P-tree Structure and Algebra: Lossless Representation and Data Mining-ready Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The P-tree Structure and its AlgebraQin DingMaleq KhanAmalendu Roy * William Perrizo Department of Computer ScienceNorth Dakota State University, USA(P-tree technology is patented by NDSU)

  2. Outline • P-tree Structure and variations • P-tree Algebra • P-tree Properties • P-tree Performance

  3. Introduction to P-tree • Peano Count Tree (P-tree) • Lossless representation of the original spatial data • Data-mining-ready structure

  4. Spatial Data • Remotely sensed imagery data • Satellite images • Aerial photography • Ground data • Yield production • Moisture • Nitrate • Temperature

  5. Remotely Sensed Imagery data TIFF image Yield Map

  6. Spatial Data Formats • Existing formats • BSQ (Band Sequential) • BIL (Band Interleaved by Line) • BIP (Band Interleaved by Pixel) • New format • bSQ (bit Sequential)

  7. Spatial Data Formats (Cont.) • BAND-1 • 54 127 • (1111 1110) (0111 1111) • 4 193 • (0000 1110) (1100 0001) • BAND-2 • 7 240 • (0010 0101) (1111 0000) • 00 19 • (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19

  8. Spatial Data Formats (Cont.) • BAND-1 • 54 127 • (1111 1110) (0111 1111) • 4 193 • (0000 1110) (1100 0001) • BAND-2 • 7 240 • (0010 0101) (1111 0000) • 00 19 • (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19

  9. Spatial Data Formats (Cont.) • BAND-1 • 54 127 • (1111 1110) (0111 1111) • 4 193 • (0000 1110) (1100 0001) • BAND-2 • 7 240 • (0010 0101) (1111 0000) • 00 19 • (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19

  10. Spatial Data Formats (Cont.) bSQ format (16 files) B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B28 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 • BAND-1 • 54 127 • (1111 1110) (0111 1111) • 4 193 • (0000 1110) (1100 0001) • BAND-2 • 7 240 • (0010 0101) (1111 0000) • 00 19 • (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19

  11. bSQ Format • Split each band into eight separate files, one for each bit position. • Reasons of using bSQ format • Different bits contribute to the value differently. • bSQ format facilitates the representation of a precision hierarchy (from 1 bit up to 8 bit precision). • bSQ format facilitates the creation of an efficient data structure P-tree.

  12. Peano Count Tree (P-tree) • P-tree represents Spatial data bit-by-bit in a recursive quadrant-by-quadrant arrangement. • P-tree is a lossless structure of original data. • P-tree is a compressed structure.

  13. An example of a P-tree 55 55 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 16 16 8 8 15 15 16 16 3 3 0 0 4 4 1 1 4 4 4 4 3 3 4 4 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 bSQ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • Peano or Z-ordering • Pure (Pure-1/Pure-0) quadrant • Root Count Arranged in 2-D space in raster order • Level • Fan-out • QID (Quadrant ID)

  14. An example of Ptree 001 55 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 2 3 16 8 15 16 2 3 0 4 1 4 4 3 4 3 111 1 1 1 0 0 0 1 0 1 1 0 1 2 . 2 . 3 ( 7, 1 ) 10.10.11 ( 111, 001 ) • Peano or Z-ordering • Pure (Pure-1/Pure-0) quadrant • Root Count • Level • Fan-out • QID (Quadrant ID)

  15. P-tree variation – PM-tree 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 • Peano Mask tree (PM-tree) uses mask instead of count. • 1 denotes pure-1, 0 denotes pure-0 and m denotes mixed. • It provides an efficient way for ANDing. m ____________/ / \ \___________ / ___ / \___ \ / / \ \ 1 ____m__ _m__ 1 / / | \ / | \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101

  16. P-tree variations – P1-tree and P0-tree • In P1-tree, we use 1 to indicate the pure-1 quadrant while use 0 to indicate others. • In P0-tree, we use 1 to indicate the pure-0 quadrant while use 0 to indicate others. P1-tree 0 P0-tree0 ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 1 __0____ _0__ 0 0 0 0 1 / / | \ / | \ \ / / \ \ / / \ \ 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 //|\ //|\ //|\ //|\ //|\ //|\ 1100 0010 0001 1100 0010 0001

  17. Ptree Algebra • And • Or • Complement • Other (XOR, etc) 64 - 55 = 9 Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101 Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

  18. P-tree Algebra (Cont.) P-tree-1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101 P-tree-2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100 AND-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100 OR-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 m 1 1 / / \ \ m 0 1 m //|\ //|\ 1110 0010

  19. P-tree ANDing Operation Operand 1 (quadrant) Operand 2 (quadrant) Result (quadrant) 1 X2 X2 0 X2 0 X1 1 X1 X1 0 0 m m 0 if four sub-quadrants result in 0; Otherwise m

  20. Ptree ANDing Operation (Cont.) PM-tree1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101 PM-tree2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100 Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100 Depth-first Pure-1 path code 0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231  RESULT 0 0  0 20 20  20 21 21  21 220 221 223 22  220 221 223 23 231  231

  21. Value P-tree • Value P-tree, Pi (v), is the P-tree of value v in bandi. Value v can be expressed in 1-to-8 bit precision. • Value P-trees can be constructed by ANDing basic P-trees or their complements. Pi (110)= Pi,1AND Pi,2AND Pi,3’

  22. Tuple P-tree • Tuple P-tree, P (v1, v2, …, vn), is the P-tree of a value vi at band i, for all i from 1 to n. • P(v1,v2,…,vn)= P1(v1) AND P2(v2) …AND Pn(vn) • If value vj is not given, it means it could be any value in Band j, P(v1, v2,…,vj-1,, vj+1,…,vn), and then the jth AND operand is simply omitted.

  23. Interval P-tree • A interval P-tree Pi (v1, v2), is the P-tree for value in the interval of [v1, v2] of band i. We have, Pi (v1, v2) = OR Pi (v), for all v in [v1, v2].

  24. Peano Cube (P-cube) The (v1,v2,v3)th cell of the P-cube contains P(v1,v2,v3) = P1,v1 AND P2,v2 AND P3,v3 where e.g., Pi,vi = Pi,110 = Pi,1 AND Pi,2 AND P’i,3 (P-cube above shows just root counts of the P-trees) P-cube can be rolled-up (on left), sliced, diced… Characteristic function applied to the NPZ truth tree is bit-map index for each attribute.

  25. P-tree Properties Lemma 1: For any two P-trees P1 and P2, rc(P1 | P2) = 0  rc(P1) = 0 and rc(P2) = 0. More strictly, rc(P1 | P2) = 0, if and only if rc(P1) = 0 and rc(P2) = 0.

  26. P-tree Properties (Cont.) Lemma 2: a) rc(P1 ) = 0 or rc(P2 ) = 0 rc(P1 & P2 ) = 0 b) rc(P1 ) = 0 and rc(P2 ) = 0 rc(P1 & P2 ) = 0. c) rc( ) = 0 d) rc( ) = N e) f) g) h) i) j)

  27. P-tree Properties (Cont.) • Lemma 3:v1 v2 rc{Pi (v1) & Pi(v2)}=0, for any band i. • Lemma 4:rc(P1 | P2) = rc(P1) + rc(P2) - rc(P1& P2). • Theorem:rc{Pi (v1) | Pi(v2)} = rc{Pi (v1)} + rc{Pi(v2)}, where v1 v2.

  28. P-tree Performance Comparison of file size for different bits of Band 1 & 2 of a TIFF image

  29. P-tree Performance (Cont.) Comparison of file size for different bits of Band 3 & 4 of a SPOT image

  30. P-tree Performance (Cont.) Comparison of file size for differentbits of Band 5 & 6 of a TM image

  31. P-tree Performance (Cont.) Time Vs Lowest Bit Number 4 3 PC-Tree 2 PMT Peano Sequence 1 0 0 1 2 3 4 5 6 7 8 Lowest Bit Number Times required to perform ANDing operation on a TM file (40 million pixels)

  32. P-tree Performance (Cont.) Average time required to perform ANDing operation on a TM file (40 million pixels)

  33. Related Work • Related Structure • Quadtrees and its variants (point quadtrees, region quadtrees) • HH-codes • Similarities • Quadrant based • Difference • P-trees focus on the count. • P-trees aren’t indexes, rather they are representations of datasets themselves. • P-trees are particularly useful for data mining because they contain the aggregate information needed for data mining.

  34. Conclusion • P-tree algebra and properties • P-tree for efficient data mining • Association rule mining • Classification • Clustering • P-tree application from spatial data to non-spatial • Precision agriculture • DNA Microarray data • VLSI test data analysis • Stock market data • imagery

More Related