160 likes | 285 Views
An Efficient Distance Calculation Method for Uncertain Objects. Edward Hung csehung@comp.polyu.edu.hk Hong Kong Polytechnic University 2007 CIDM, Hawaii, USA, Apr 1-5, 2007. Uncertain Objects: From Where?. Sources Sensors readings statistical classifiers in image processing
E N D
An Efficient Distance Calculation Method for Uncertain Objects Edward Hung csehung@comp.polyu.edu.hk Hong Kong Polytechnic University 2007 CIDM, Hawaii, USA, Apr 1-5, 2007
Uncertain Objects: From Where? • Sources • Sensors readings • statistical classifiers in image processing • predictive programs for stock market • Weather forecast
Uncertain Objects handled traditionally … • Transformed into exact values • Weighted average or mean • Value of highest frequency or possibility • Why bad?? • Intermediate and final results become approximate • E.g., deviation of cluster centroids and wrong assignment of some data
Distance: Why Important? • Various queries and data mining tasks, e.g., • Nearest-neighbor queries • Clustering (e.g., K-means clustering)
Distance: Why Expensive? • An uncertain object has more than one possible location • Continuous E.g., take n samples on each uncertain object • More samples in region of higher probability density o1 o2
Expected Distance: Why Expensive? • Expected distance: weighted average of all pair-wise combinations’ distances • VERY expensive • Much cheaper IF we do NOT need to try all combinations
Analytic Solutions • Uniform pdf • Gaussian pdf
Approximation Methods for Arbitrary pdf • 5 methods proposed …
2. Pair-wise between Random Samples (PRS) • take n samples on each uncertain object o1 o2
3. Grid Approximation and Pair-wise between Samples (GAPS) • Approximation by a grid of √s X √s cells formed on the uncertainty domain • Probability of each cell determined by sampling
4. Pair-wise between Gaussian Mixture (PGM) • Use K-means to cluster samples into a few clusters) • Approximate the uncertain object by a mixture of Gaussian distributions o1 o2
5. Approximation by Single Gaussian (ASG) • Approximate an uncertain object by a single Gaussian distributions: • Complexity = O((ni+nj)d) o1 o2
Equivalence of PRS, PGM and ASG • Theorem: • Given any uncertain objects oi, oj and their samples, EDPRS(oi,oj)=EDPGM(oi,oj)=EDASG(oi,oj) • So, ASG vs PRS, PGM • Cheapest with same accuracy • What about ASG vs DM and GAPS?
Performance Study • Experimental results show that • ASG vs DM • much more accurate with comparable speed • ASG VS GAPS • much faster than GAPS with higher or comparable accuracy
Conclusion • ASG can obtain highly accurate results quickly • For data with arbitrary pdf, uniform pdf, Gaussian mixture pdf • ASG can replace GAPS used in recent research work