330 likes | 448 Views
A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays . Presented by: ZHANG Xiaofei March 2, 2011. Outline. Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments. Outline. Motivation
E N D
A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Motivation • Multidimensional arrays • Suit for scientific and engineering applications • Logically equivalent to relational tables <A1,A2,…,An> D2 D1 A cell of the multidimensional arrays: (A1,A2,…,Ak, D1,D2,…Dd)
Motivation (Cont’d) • Uncertain data • Inevitable • Two categories
Motivation (Cont’d) • Correlated uncertain data • Examples: Geographically distributed sensors More applications examples can be found in router’s network traffic analysis, quantization of image or sound, etc.
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Modeling Correlated Uncertainty • PGM: Probabilistic Graphical Model • Bayesian network Limitations: Prior knowledge and initial probabilities Significant computational cost(NP hard)
Modeling Correlated Uncertainty (Cont’d) • PGM: Probabilistic Graphical Model • Markov Random Fields A graphical model in which a set of random variables have a Markov property described by an undirected graph Pros: cyclic dependencies Cons: no induced dependencies NP hard to compute
Modeling Correlated Uncertainty (Cont’d) • Considering the locality of correlation • E.g. a 2-dimensional arrays
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Construction of A*-tree • Basic A*-structure k-ary tree: k=2^d, where d is the number of correlated dimensions Each leaf contains the joint distribution of four neighboring cells it maps to The joint distribution at each internal node is recursively defined
Construction of A*-tree (Cont’d) • Joint distribution at a node X1 X2 Y=(X1+X2+X3+X4)/4 Xi=Y(1+Fi) X3 X4 Fi range k, r entries in distribution table, l bits to present probability
Construction of A*-tree (Cont’d) • Extension of A*-tree • Uneven dimensional size • 2k+1 partitioned as k and k+1 • Shorter dimension stops partition first, with partition of longer dimension goes on
Construction of A*-tree (Cont’d) • Extension of A*-tree • Basic uncertainty blocks of arbitrary shapes • Each cell is intuitively the basic uncertain block, however, maybe this granularity is too fine • Initial identification of uncertainty blocks is user and application specified
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Analysis of A*-tree • Natural mapping from A*-tree to Bayesian Network
Analysis of A*-tree (Cont’d) • How A*-tree model express the neighboring correlation • From the perspective of any random query, the average level where cell correlation is encoded is low. (efficient inference & accurate modeling)
Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance • Definition
Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance
Analysis of A*-tree (Cont’d) • CD (Clustering Distance) • For any query that may return q pairs of neighboring cells Expected average CD e.g. for 1024*1024 array, h=10, then E(argCD )~ 1.01
Analysis of A*-tree (Cont’d) • Accuracy vs. Efficiency • Double “flip” • Polynomial time scan O(d*n) • Consider basic uncertainty block
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Query Processing • Monte Carlo based query processing • Sampling Q: select avg(brightness) From space_image Where Dis(x,y,z,322,108,251)<50
Query Processing (Cont’d) • Compared with MRF • MRF require sequenced round sampling • Each sample node is computed from all the nodes
Query Processing (Cont’d) • Other queries • COUNT, AVG and SUM • Minimum Set Cover • Build-in cell-count function • Effectively query answering
Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments
Experiments • Data set description • Evaluations • Accuracy of modeling the underlying joint distribution • Execution time • Aggregate query • Space cost
Experiments (Cont’d) • Accuracy
Experiments (Cont’d) • Accuracy
Experiments (Cont’d) • Execution time
Experiments (Cont’d) • Aggregate query and space cost
Thank you! Q&A