1 / 31

An Introduction to Multivariate Data Visualization and XmdvTool

An Introduction to Multivariate Data Visualization and XmdvTool. Matthew O. Ward Computer Science Department Worcester Polytechnic Institute. This work was supported under NSF Grant IIS-9732897. What is Multivariate Data?. Each data point has N variables or observations

crankin
Download Presentation

An Introduction to Multivariate Data Visualization and XmdvTool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Multivariate Data Visualization and XmdvTool Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported under NSF Grant IIS-9732897 UC Berkeley, 09/19/00

  2. What is Multivariate Data? • Each data point has N variables or observations • Each observation can be: • nominal or ordinal • discrete or continuous • scalar, vector, or tensor • May or may not have spatial, temporal, or other connectivity attribute UC Berkeley, 09/19/00

  3. Characteristics of a Variable • Order: grades have an order, brand names do not. • Distance metric: for income, distance equals difference. For rankings, difference is not a distance metric. • Absolute zero: temperature has an absolute zero, bank account balances do not. • A variable can be classified by these three attributes, called Scale. • Effective visualizations attempt to match the scale of the data dimension with the graphical attribute conveying it. UC Berkeley, 09/19/00

  4. Sources of Multivariate Data • Sensors (e.g., images, gauges) • Simulations • Census or other surveys • Commerce (e.g., stock market) • Communication systems • Spreadsheets and databases UC Berkeley, 09/19/00

  5. Issues in Visualizing Multivariate Data • How many variables? • How many records? • Types of variables? • User task (exploration, confirmation, presentation) • Data feature of interest (clusters, anomalies, trends, patterns, ….) • Background of user (domain expert, visualization specialist, decision-maker, ….) UC Berkeley, 09/19/00

  6. Methods for Visualizing Multivariate Data • Dimensional Subsetting • Dimensional Reorganization • Dimensional Embedding • Dimensional Reduction UC Berkeley, 09/19/00

  7. Dimensional Subsetting • Scatterplot matrix displays all pairwise plots • Selection allows linkage between views • Clusters, trends, and correlations readily discerned between pairs of dimensions UC Berkeley, 09/19/00

  8. Dimensional Reorganization • Parallel Coordinates creates parallel, rather than orthogonal, dimensions. • Data point corresponds to polyline across axes • Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes UC Berkeley, 09/19/00

  9. Dimensional Reorganization (2) • Glyphs map data dimensions to graphical attributes • Size, color, shape, and orientation are commonly used • Similarities/differences in features give insights into relations UC Berkeley, 09/19/00

  10. Dimensional Embedding • Dimensional stacking divides data space into bins • Each N-D bin has a unique 2-D screen bin • Screen space recursively divided based on bin count for each dimension • Clusters and trends manifested as repeated patterns UC Berkeley, 09/19/00

  11. Dimensional Reduction • Map N-D locations to M-D display space while best preserving N-D relations • Approaches include MDS, PCA, and Kohonen Self Organizing Maps • Relationships conveyed by position, links, color, shape, size, etc. UC Berkeley, 09/19/00

  12. The Role of Selection • User needs to interact with display, examine interesting patterns or anomalies, validate hypotheses • Selection allows isolation of subset of data for highlighting, deleting, focussed analysis • Direct (clicking on displayed items ) vs. indirect (range sliders) • Screen space (2-D) vs. data space (N-D) UC Berkeley, 09/19/00

  13. Demonstration of XmdvTool UC Berkeley, 09/19/00

  14. Problems with Large Data Sets • Most techniques are effective with small to moderate sized data sets • Large sets (> 50K records) are increasingly common • When traditional visualizations used, occlusion and clutter make interpretation difficult UC Berkeley, 09/19/00

  15. One Potential Solution • Multiresolution displays with aggregation • Explicit clustering • Break dimensions into bins • Aggregate in a particular order (datacubes) • Implicit clustering • Hierarchical clustering (proximity-based merging) • Hierarchical partitioning (proximity-based splits) • Problem: many ways to cluster, each revealing different aspects of data UC Berkeley, 09/19/00

  16. Display Options • For each cluster, show • Center • Extents for each dimension • Population • Other descriptors (e.g., quartiles) • Color clusters such that siblings have similar color to parents UC Berkeley, 09/19/00

  17. Hierarchical Parallel Coordinates • Bands show cluster extents in each dimension • Opacity conveys cluster population • Color similarity indicates proximity in hierarchy UC Berkeley, 09/19/00

  18. Hierarchical Scatterplots • Clusters displayed as rectangles, showing extents in 2 dimensions • Color/opacity consistently used for relational and population info UC Berkeley, 09/19/00

  19. Hierarchical Glyphs • Star glyph with bands • Analogous to parallel coordinates, with radial rather than parallel dimensions • Glyph position critical for conveying relational info UC Berkeley, 09/19/00

  20. Hierarchical Dimensional Stacking • Clusters occupy multiple bins • Overlaps can be reduced by increasing number of bins • Cell colors can be blended, or display last cluster mapped to space UC Berkeley, 09/19/00

  21. Hierarchical Star Fields • Dimensional reduction techniques commonly displayed with starfields • Each cluster becomes circle/sphere in field • Alternatively, can show glyph at cluster location UC Berkeley, 09/19/00

  22. Navigating Hierarchies • Drill-down, roll-up operations for more or less detail • Need selection operation to identify subtrees for exploration, pruning • Need indications of where you are in hierarchy, and where you’ve been during exploration process UC Berkeley, 09/19/00

  23. Structure-Based Brushing • Enhancement to screen-based and data-based methods • Specify focus, extents, and level of detail • Intuitive - wedge of tree and depth of interest • Implemented by labeling/numbering terminals and propagating ranges to parents UC Berkeley, 09/19/00

  24. Structure-Based Brush • White contour links terminal nodes • Red wedge is extents selection • Color curve is depth specification • Color bar maps location in tree to unique color • Direct and indirect manipulation of brush UC Berkeley, 09/19/00

  25. Demonstration of Hierarchical Features in XmdvTool UC Berkeley, 09/19/00

  26. Auxiliary Tools • Extent scaling to reduce occlusion of bands • Dimensional zooming - fill display with selected subspace (N-D distortion) • Dynamic masking to fade out selected or unselected data • Saving selected subsets • Enabling/disabling dimensions • Univariate displays (Tukey box plots, tree maps) UC Berkeley, 09/19/00

  27. Summary • Many ways to map multivariate data to images, each with strengths and weaknesses • Linking between and within displays with brushing enhances static displays and combines their strengths. • Hierarchies and aggregations allow visualization of large data sets • Intuitive navigation, filtering, and focus critical to exploration process • Each basic multivariate visualization method is readily extensible to displaying cluster information UC Berkeley, 09/19/00

  28. Problems and Future Work • Many data characteristics not currently supported (e.g., text fields, records with missing entries, data quality) • Navigation tool for hierarchies assumes linear order of nodes • Looking at tools for dynamic reorganization • Exploring 2-D or higher navigation interfaces • Very large hierarchies are difficult to focus on a narrow subset • Developing multiresolution interface • Investigating distortion techniques for navigation UC Berkeley, 09/19/00

  29. Other Future Work • Projection pursuit or view recommender • Linking structure and data brushing • User studies • Customization based on domain • Query optimization via caching and prefetching UC Berkeley, 09/19/00

  30. For More Information • XmdvTool available to the public domain • Both Unix and Windows support • Next release will have Oracle interface, with query optimization to support exploratory operations • http://davis.wpi.edu/~xmdv • Papers in Vis ‘94, ‘95, ‘99, Infovis ‘99, IEEE TVCG Vol. 6, No. 2, 2000 UC Berkeley, 09/19/00

  31. Thanks to….. • Elke Rundensteiner • Ying-Huey Fua • Daniel Stroe • Yang Jing • Suggestions from Xmdv users • NSF UC Berkeley, 09/19/00

More Related