330 likes | 382 Views
An Introduction to Multivariate Data Visualization and XmdvTool. Matthew O. Ward Computer Science Department Worcester Polytechnic Institute. This work was supported under NSF Grant IIS-9732897. What is Multivariate Data?. Each data point has N variables or observations
E N D
An Introduction to Multivariate Data Visualization and XmdvTool Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported under NSF Grant IIS-9732897 UC Berkeley, 09/19/00
What is Multivariate Data? • Each data point has N variables or observations • Each observation can be: • nominal or ordinal • discrete or continuous • scalar, vector, or tensor • May or may not have spatial, temporal, or other connectivity attribute UC Berkeley, 09/19/00
Characteristics of a Variable • Order: grades have an order, brand names do not. • Distance metric: for income, distance equals difference. For rankings, difference is not a distance metric. • Absolute zero: temperature has an absolute zero, bank account balances do not. • A variable can be classified by these three attributes, called Scale. • Effective visualizations attempt to match the scale of the data dimension with the graphical attribute conveying it. UC Berkeley, 09/19/00
Sources of Multivariate Data • Sensors (e.g., images, gauges) • Simulations • Census or other surveys • Commerce (e.g., stock market) • Communication systems • Spreadsheets and databases UC Berkeley, 09/19/00
Issues in Visualizing Multivariate Data • How many variables? • How many records? • Types of variables? • User task (exploration, confirmation, presentation) • Data feature of interest (clusters, anomalies, trends, patterns, ….) • Background of user (domain expert, visualization specialist, decision-maker, ….) UC Berkeley, 09/19/00
Methods for Visualizing Multivariate Data • Dimensional Subsetting • Dimensional Reorganization • Dimensional Embedding • Dimensional Reduction UC Berkeley, 09/19/00
Dimensional Subsetting • Scatterplot matrix displays all pairwise plots • Selection allows linkage between views • Clusters, trends, and correlations readily discerned between pairs of dimensions UC Berkeley, 09/19/00
Dimensional Reorganization • Parallel Coordinates creates parallel, rather than orthogonal, dimensions. • Data point corresponds to polyline across axes • Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes UC Berkeley, 09/19/00
Dimensional Reorganization (2) • Glyphs map data dimensions to graphical attributes • Size, color, shape, and orientation are commonly used • Similarities/differences in features give insights into relations UC Berkeley, 09/19/00
Dimensional Embedding • Dimensional stacking divides data space into bins • Each N-D bin has a unique 2-D screen bin • Screen space recursively divided based on bin count for each dimension • Clusters and trends manifested as repeated patterns UC Berkeley, 09/19/00
Dimensional Reduction • Map N-D locations to M-D display space while best preserving N-D relations • Approaches include MDS, PCA, and Kohonen Self Organizing Maps • Relationships conveyed by position, links, color, shape, size, etc. UC Berkeley, 09/19/00
The Role of Selection • User needs to interact with display, examine interesting patterns or anomalies, validate hypotheses • Selection allows isolation of subset of data for highlighting, deleting, focussed analysis • Direct (clicking on displayed items ) vs. indirect (range sliders) • Screen space (2-D) vs. data space (N-D) UC Berkeley, 09/19/00
Demonstration of XmdvTool UC Berkeley, 09/19/00
Problems with Large Data Sets • Most techniques are effective with small to moderate sized data sets • Large sets (> 50K records) are increasingly common • When traditional visualizations used, occlusion and clutter make interpretation difficult UC Berkeley, 09/19/00
One Potential Solution • Multiresolution displays with aggregation • Explicit clustering • Break dimensions into bins • Aggregate in a particular order (datacubes) • Implicit clustering • Hierarchical clustering (proximity-based merging) • Hierarchical partitioning (proximity-based splits) • Problem: many ways to cluster, each revealing different aspects of data UC Berkeley, 09/19/00
Display Options • For each cluster, show • Center • Extents for each dimension • Population • Other descriptors (e.g., quartiles) • Color clusters such that siblings have similar color to parents UC Berkeley, 09/19/00
Hierarchical Parallel Coordinates • Bands show cluster extents in each dimension • Opacity conveys cluster population • Color similarity indicates proximity in hierarchy UC Berkeley, 09/19/00
Hierarchical Scatterplots • Clusters displayed as rectangles, showing extents in 2 dimensions • Color/opacity consistently used for relational and population info UC Berkeley, 09/19/00
Hierarchical Glyphs • Star glyph with bands • Analogous to parallel coordinates, with radial rather than parallel dimensions • Glyph position critical for conveying relational info UC Berkeley, 09/19/00
Hierarchical Dimensional Stacking • Clusters occupy multiple bins • Overlaps can be reduced by increasing number of bins • Cell colors can be blended, or display last cluster mapped to space UC Berkeley, 09/19/00
Hierarchical Star Fields • Dimensional reduction techniques commonly displayed with starfields • Each cluster becomes circle/sphere in field • Alternatively, can show glyph at cluster location UC Berkeley, 09/19/00
Navigating Hierarchies • Drill-down, roll-up operations for more or less detail • Need selection operation to identify subtrees for exploration, pruning • Need indications of where you are in hierarchy, and where you’ve been during exploration process UC Berkeley, 09/19/00
Structure-Based Brushing • Enhancement to screen-based and data-based methods • Specify focus, extents, and level of detail • Intuitive - wedge of tree and depth of interest • Implemented by labeling/numbering terminals and propagating ranges to parents UC Berkeley, 09/19/00
Structure-Based Brush • White contour links terminal nodes • Red wedge is extents selection • Color curve is depth specification • Color bar maps location in tree to unique color • Direct and indirect manipulation of brush UC Berkeley, 09/19/00
Demonstration of Hierarchical Features in XmdvTool UC Berkeley, 09/19/00
Auxiliary Tools • Extent scaling to reduce occlusion of bands • Dimensional zooming - fill display with selected subspace (N-D distortion) • Dynamic masking to fade out selected or unselected data • Saving selected subsets • Enabling/disabling dimensions • Univariate displays (Tukey box plots, tree maps) UC Berkeley, 09/19/00
Summary • Many ways to map multivariate data to images, each with strengths and weaknesses • Linking between and within displays with brushing enhances static displays and combines their strengths. • Hierarchies and aggregations allow visualization of large data sets • Intuitive navigation, filtering, and focus critical to exploration process • Each basic multivariate visualization method is readily extensible to displaying cluster information UC Berkeley, 09/19/00
Problems and Future Work • Many data characteristics not currently supported (e.g., text fields, records with missing entries, data quality) • Navigation tool for hierarchies assumes linear order of nodes • Looking at tools for dynamic reorganization • Exploring 2-D or higher navigation interfaces • Very large hierarchies are difficult to focus on a narrow subset • Developing multiresolution interface • Investigating distortion techniques for navigation UC Berkeley, 09/19/00
Other Future Work • Projection pursuit or view recommender • Linking structure and data brushing • User studies • Customization based on domain • Query optimization via caching and prefetching UC Berkeley, 09/19/00
For More Information • XmdvTool available to the public domain • Both Unix and Windows support • Next release will have Oracle interface, with query optimization to support exploratory operations • http://davis.wpi.edu/~xmdv • Papers in Vis ‘94, ‘95, ‘99, Infovis ‘99, IEEE TVCG Vol. 6, No. 2, 2000 UC Berkeley, 09/19/00
Thanks to….. • Elke Rundensteiner • Ying-Huey Fua • Daniel Stroe • Yang Jing • Suggestions from Xmdv users • NSF UC Berkeley, 09/19/00