1 / 16

An Efficient Distance Calculation Method for Uncertain Objects

An Efficient Distance Calculation Method for Uncertain Objects. Edward Hung csehung@comp.polyu.edu.hk Hong Kong Polytechnic University 2007 CIDM, Hawaii, USA, Apr 1-5, 2007. Uncertain Objects: From Where?. Sources Sensors readings statistical classifiers in image processing

sema
Download Presentation

An Efficient Distance Calculation Method for Uncertain Objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Distance Calculation Method for Uncertain Objects Edward Hung csehung@comp.polyu.edu.hk Hong Kong Polytechnic University 2007 CIDM, Hawaii, USA, Apr 1-5, 2007

  2. Uncertain Objects: From Where? • Sources • Sensors readings • statistical classifiers in image processing • predictive programs for stock market • Weather forecast

  3. Uncertain Objects handled traditionally … • Transformed into exact values • Weighted average or mean • Value of highest frequency or possibility • Why bad?? • Intermediate and final results become approximate • E.g., deviation of cluster centroids and wrong assignment of some data

  4. Distance: Why Important? • Various queries and data mining tasks, e.g., • Nearest-neighbor queries • Clustering (e.g., K-means clustering)

  5. Distance: Why Expensive? • An uncertain object has more than one possible location • Continuous E.g., take n samples on each uncertain object • More samples in region of higher probability density o1 o2

  6. Expected Distance: Why Expensive? • Expected distance: weighted average of all pair-wise combinations’ distances • VERY expensive • Much cheaper IF we do NOT need to try all combinations

  7. Analytic Solutions • Uniform pdf • Gaussian pdf

  8. Approximation Methods for Arbitrary pdf • 5 methods proposed …

  9. 1. Distance between Means (DM) o1 o2

  10. 2. Pair-wise between Random Samples (PRS) • take n samples on each uncertain object o1 o2

  11. 3. Grid Approximation and Pair-wise between Samples (GAPS) • Approximation by a grid of √s X √s cells formed on the uncertainty domain • Probability of each cell determined by sampling

  12. 4. Pair-wise between Gaussian Mixture (PGM) • Use K-means to cluster samples into a few clusters) • Approximate the uncertain object by a mixture of Gaussian distributions o1 o2

  13. 5. Approximation by Single Gaussian (ASG) • Approximate an uncertain object by a single Gaussian distributions: • Complexity = O((ni+nj)d) o1 o2

  14. Equivalence of PRS, PGM and ASG • Theorem: • Given any uncertain objects oi, oj and their samples, EDPRS(oi,oj)=EDPGM(oi,oj)=EDASG(oi,oj) • So, ASG vs PRS, PGM • Cheapest with same accuracy • What about ASG vs DM and GAPS?

  15. Performance Study • Experimental results show that • ASG vs DM • much more accurate with comparable speed • ASG VS GAPS • much faster than GAPS with higher or comparable accuracy

  16. Conclusion • ASG can obtain highly accurate results quickly • For data with arbitrary pdf, uniform pdf, Gaussian mixture pdf • ASG can replace GAPS used in recent research work

More Related