740 likes | 888 Views
Statistical Data Mining - 3. Edward J. Wegman. A Short Course for Interface ‘01. Visual Data Mining. Outline of Lecture . Visual Complexity Description of Basic Techniques Parallel Coordinates Grand Tour Saturation Brushing Illustrations of Basic Techniques
E N D
Statistical Data Mining - 3 Edward J. Wegman A Short Course for Interface ‘01
Outline of Lecture • Visual Complexity • Description of Basic Techniques • Parallel Coordinates • Grand Tour • Saturation Brushing • Illustrations of Basic Techniques • Rapid Data Editing, Density Estimation (Pollen Data) • Inverse Regression, Tree Structured Decision Rules (Bank Data) • Classification & Clustering (SALAD Data & Artificial Nose ) • Structural Inference (PRIM 7 Data) • Data Mining (BLS Cereal Scanner Data) • Cluster Trees (Oronsay Sand Particle Size Data)
Visual Complexity Scenarios Typical high resolution workstations, 1280x1024 = 1.31x106 pixels Realistic using Wegman, immersion, 4:5 aspect ratio, 2333x1866 = 4.35x106 pixels Very optimistic using 1 minute arc, immersion, 4:5 aspect ratio, 8400x6720 = 5.65x107 pixels Wildly optimistic using Maar(2), immersion, 4:5 aspect ratio, 17,284x13,828 = 2.39x108 pixels
Visual Complexity • Visualization for Data Mining can realistically hope to deal with somewhere on the order of 106 to 107 observations. This coincides with the approximate limits for interactive computing of O(n2) algorithms and fordata transfer. This also roughly corresponds to the number of foveal cones in the eye.
Methodologies for Visual Data Mining • Parallel Coordinates • Effective Method for High Dimensional Data • High Dimensions = Multiple Attributes • Grand Tour • Generalized Rotation in High Dimensions • In Depth Study of High Dimensional Data • Saturation Brushing • Effective Method for Large Data Sets
Visual Data Mining Techniques • Multidimensional Data Visualization • Scatterplot matrix • Parallel coordinate plots • 3-D stereoscopic scatterplots • Grand tour on all plot devices • Density plots • Linked views • Saturation brushing • Pruning and cropping
Data Editing and Density Estimation • Pollen Data • 3848 points • 5 dimensions C
Inverse Regression and Tree Structured Decision Rules with Financial Data • Bank Demographic Data in 8 Dimensions with 12,000+ points
Inverse Regression and Tree Structured Decision Rules with Financial Data
Inverse Regression and Tree Structured Decision Rules with Financial Data
Inverse Regression and Tree Structured Decision Rules with Financial Data
Classification and Clustering Using SALAD Data • Chemical Agent Detection Data in 13 Dimensions with 10,000+ points
Artificial Dog Nose • 19 dimensional time series in 2 spectral bands • 60 time steps for 300 chemical species c
Artificial Dog Nose Time series in two spectral bands for same chemical species
Artificial Dog Nose Phase loop
Artificial Dog Nose Orthogonal components
Artificial Dog Nose After grand tour, orthogonal variables x2*, x9*, x15*, x16*, x18* separate the two spectral bands
Artificial Dog Nose Four chemical species, target highlighted in red
Artificial Dog Nose Target species separated by x1*, x3*, x5*, x6*, x11*, x15*
PRIM-7 7 dimensional high energy physics data 500 data points pi-meson proton interaction
Structural Inference Using PRIM 7 Data
Structural Inference Using PRIM 7 Data
Structural Inference Using PRIM 7 Data
Structural Inference Using PRIM 7 Data
Structural Inference Using PRIM 7 Data
Scanner Data for Breakfast Cereals • 5.5 gigabytes of scanner data in relational database • Price, sales volume, promotion, store, chain, PSU, UPC • Work done at BLS • Phase 1 – Basic Data Analysis – Single Month • Phase 2 – Price Relative Effects – 1 Year • Phase 3 – Churning Effects – 5 Years
Scanner Data for Breakfast Cereals Promotion has huge impact on sales volume
Scanner Data for Breakfast Cereals Stores not randomized
Scanner Data for Breakfast Cereals Aggressive promotion pays
Scanner Data for Breakfast Cereals Outliers belong to same chain
Scanner Data for Breakfast Cereals Promotion both years
Scanner Data for Breakfast Cereals Range of items with no promotion
Scanner Data for Breakfast Cereals One chain ceased promotions