1 / 74

Statistical Data Mining - 3

Statistical Data Mining - 3. Edward J. Wegman. A Short Course for Interface ‘01. Visual Data Mining. Outline of Lecture . Visual Complexity Description of Basic Techniques Parallel Coordinates Grand Tour Saturation Brushing Illustrations of Basic Techniques

neith
Download Presentation

Statistical Data Mining - 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Data Mining - 3 Edward J. Wegman A Short Course for Interface ‘01

  2. Visual Data Mining

  3. Outline of Lecture • Visual Complexity • Description of Basic Techniques • Parallel Coordinates • Grand Tour • Saturation Brushing • Illustrations of Basic Techniques • Rapid Data Editing, Density Estimation (Pollen Data) • Inverse Regression, Tree Structured Decision Rules (Bank Data) • Classification & Clustering (SALAD Data & Artificial Nose ) • Structural Inference (PRIM 7 Data) • Data Mining (BLS Cereal Scanner Data) • Cluster Trees (Oronsay Sand Particle Size Data)

  4. Visual Complexity Scenarios Typical high resolution workstations, 1280x1024 = 1.31x106 pixels Realistic using Wegman, immersion, 4:5 aspect ratio, 2333x1866 = 4.35x106 pixels Very optimistic using 1 minute arc, immersion, 4:5 aspect ratio, 8400x6720 = 5.65x107 pixels Wildly optimistic using Maar(2), immersion, 4:5 aspect ratio, 17,284x13,828 = 2.39x108 pixels

  5. Visual Complexity • Visualization for Data Mining can realistically hope to deal with somewhere on the order of 106 to 107 observations. This coincides with the approximate limits for interactive computing of O(n2) algorithms and fordata transfer. This also roughly corresponds to the number of foveal cones in the eye.

  6. Methodologies for Visual Data Mining • Parallel Coordinates • Effective Method for High Dimensional Data • High Dimensions = Multiple Attributes • Grand Tour • Generalized Rotation in High Dimensions • In Depth Study of High Dimensional Data • Saturation Brushing • Effective Method for Large Data Sets

  7. Visual Data Mining Techniques • Multidimensional Data Visualization • Scatterplot matrix • Parallel coordinate plots • 3-D stereoscopic scatterplots • Grand tour on all plot devices • Density plots • Linked views • Saturation brushing • Pruning and cropping

  8. Crystal Vision

  9. Crystal Vision

  10. Crystal Vision

  11. Crystal Vision

  12. Data Editing and Density Estimation • Pollen Data • 3848 points • 5 dimensions C

  13. Pollen Data

  14. Pollen Data

  15. Pollen Data

  16. Pollen Data

  17. Pollen Data

  18. Pollen Data

  19. Inverse Regression and Tree Structured Decision Rules with Financial Data • Bank Demographic Data in 8 Dimensions with 12,000+ points

  20. Inverse Regression and Tree Structured Decision Rules with Financial Data

  21. Inverse Regression and Tree Structured Decision Rules with Financial Data

  22. Inverse Regression and Tree Structured Decision Rules with Financial Data

  23. Classification and Clustering Using SALAD Data • Chemical Agent Detection Data in 13 Dimensions with 10,000+ points

  24. Classification and Clustering Using SALAD Data

  25. Classification and Clustering Using SALAD Data

  26. Artificial Dog Nose • 19 dimensional time series in 2 spectral bands • 60 time steps for 300 chemical species c

  27. Artificial Dog Nose Time series in two spectral bands for same chemical species

  28. Artificial Dog Nose Phase loop

  29. Artificial Dog Nose Orthogonal components

  30. Artificial Dog Nose After grand tour, orthogonal variables x2*, x9*, x15*, x16*, x18* separate the two spectral bands

  31. Artificial Dog Nose Four chemical species, target highlighted in red

  32. Artificial Dog Nose Target species separated by x1*, x3*, x5*, x6*, x11*, x15*

  33. PRIM-7 7 dimensional high energy physics data 500 data points pi-meson proton interaction

  34. Structural Inference Using PRIM 7 Data

  35. Structural Inference Using PRIM 7 Data

  36. Structural Inference Using PRIM 7 Data

  37. Structural Inference Using PRIM 7 Data

  38. Structural Inference Using PRIM 7 Data

  39. Scanner Data for Breakfast Cereals • 5.5 gigabytes of scanner data in relational database • Price, sales volume, promotion, store, chain, PSU, UPC • Work done at BLS • Phase 1 – Basic Data Analysis – Single Month • Phase 2 – Price Relative Effects – 1 Year • Phase 3 – Churning Effects – 5 Years

  40. Scanner Data for Breakfast Cereals Promotion has huge impact on sales volume

  41. Scanner Data for Breakfast Cereals Stores not randomized

  42. Scanner Data for Breakfast Cereals Aggressive promotion pays

  43. Scanner Data for Breakfast Cereals

  44. Scanner Data for Breakfast Cereals

  45. Scanner Data for Breakfast Cereals Phase 2

  46. Scanner Data for Breakfast Cereals

  47. Scanner Data for Breakfast Cereals Outliers belong to same chain

  48. Scanner Data for Breakfast Cereals Promotion both years

  49. Scanner Data for Breakfast Cereals Range of items with no promotion

  50. Scanner Data for Breakfast Cereals One chain ceased promotions

More Related