A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011

Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

Motivation • Multidimensional arrays • Suit for scientific and engineering applications • Logically equivalent to relational tables <A1,A2,…,An> D2 D1 A cell of the multidimensional arrays: (A1,A2,…,Ak, D1,D2,…Dd)

Motivation (Cont’d) • Uncertain data • Inevitable • Two categories

Motivation (Cont’d) • Correlated uncertain data • Examples: Geographically distributed sensors More applications examples can be found in router’s network traffic analysis, quantization of image or sound, etc.

Modeling Correlated Uncertainty • PGM: Probabilistic Graphical Model • Bayesian network Limitations: Prior knowledge and initial probabilities Significant computational cost(NP hard)

Modeling Correlated Uncertainty (Cont’d) • PGM: Probabilistic Graphical Model • Markov Random Fields A graphical model in which a set of random variables have a Markov property described by an undirected graph Pros: cyclic dependencies Cons: no induced dependencies NP hard to compute

Modeling Correlated Uncertainty (Cont’d) • Considering the locality of correlation • E.g. a 2-dimensional arrays

Construction of A*-tree • Basic A*-structure k-ary tree: k=2^d, where d is the number of correlated dimensions Each leaf contains the joint distribution of four neighboring cells it maps to The joint distribution at each internal node is recursively defined

Construction of A*-tree (Cont’d) • Joint distribution at a node X1 X2 Y=(X1+X2+X3+X4)/4 Xi=Y(1+Fi) X3 X4 Fi range k, r entries in distribution table, l bits to present probability

Construction of A*-tree (Cont’d) • Extension of A*-tree • Uneven dimensional size • 2k+1 partitioned as k and k+1 • Shorter dimension stops partition first, with partition of longer dimension goes on

Construction of A*-tree (Cont’d) • Extension of A*-tree • Basic uncertainty blocks of arbitrary shapes • Each cell is intuitively the basic uncertain block, however, maybe this granularity is too fine • Initial identification of uncertainty blocks is user and application specified

Analysis of A*-tree • Natural mapping from A*-tree to Bayesian Network

Analysis of A*-tree (Cont’d) • How A*-tree model express the neighboring correlation • From the perspective of any random query, the average level where cell correlation is encoded is low. (efficient inference & accurate modeling)

Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance • Definition

Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance

Analysis of A*-tree (Cont’d) • CD (Clustering Distance) • For any query that may return q pairs of neighboring cells Expected average CD e.g. for 1024*1024 array, h=10, then E(argCD )~ 1.01

Analysis of A*-tree (Cont’d) • Accuracy vs. Efficiency • Double “flip” • Polynomial time scan O(d*n) • Consider basic uncertainty block

Query Processing • Monte Carlo based query processing • Sampling Q: select avg(brightness) From space_image Where Dis(x,y,z,322,108,251)<50

Query Processing (Cont’d) • Compared with MRF • MRF require sequenced round sampling • Each sample node is computed from all the nodes

Query Processing (Cont’d) • Other queries • COUNT, AVG and SUM • Minimum Set Cover • Build-in cell-count function • Effectively query answering

Experiments • Data set description • Evaluations • Accuracy of modeling the underlying joint distribution • Execution time • Aggregate query • Space cost

Experiments (Cont’d) • Accuracy

Experiments (Cont’d) • Execution time

Experiments (Cont’d) • Aggregate query and space cost

Thank you! Q&A

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays