250 likes | 456 Views
Visualization Techniques for Multivariate Discrete and Continuous Data. March 4, 2005 Rachael Brady. Multivariate Data Types. In general, each point has many attributes and/or measurements Type 1: measurements are continuous in nature, and combining dimensions might make sense
E N D
Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady
Multivariate Data Types In general, each point has many attributes and/or measurements • Type 1: measurements are continuous in nature, and combining dimensions might make sense • Weather data - for each x, y, z location we have water density (scalar), temperature (scalar), wind velocity (vector), air pressure (scalar) • Type 2: data is discrete, more like attribute list, and cannot in general be combined • Baseball statistics - for each player we have at bats, walks, hits, doubles, homeruns, RBIs. • Populations - eye color of residents in NC, income level, voting record
Approaches • Dimensional Reduction • Principle Component Analysis • Independent Component Analysis • Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting • Dimensional Subsetting • Dimensional Organization • Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000
Dimensional Subsetting - Scatter Plots • Invoke the concept of small multiples • Show all pair-wise dimensions in a matrix • Easily see clusters, trends and correlations • Problem: How do you see a trend that requires 2 or more dependent variables? Source: Matt Ward, Multivariate Vis talk Sept 2000
Dimensional Organization • Show each variable with an explicit visual representation • Spatial • Shape • Color • Size • Orientation • Texture The combination of these visual variables can produce information that “pops out”, but it is not additive Images: Chris Healey
Dimensional Organization - Glyphs (show star glyph demo) Image: Matt Ward, Multivariate Vis talk Sept 2000
Dimensional Organization - Parrallel Coords • Parallel Coordinates creates parallel, rather than orthogonal, dimensions. • Data point corresponds to polyline across axes • Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes Show Parrallel Coords Demo Source: Matt Ward, Multivariate Vis talk Sept 2000
Parrallel Coords - Useful? Source: http://www.ccs.neu.edu/home/mattsp/
Parrallel Coords - Extended Visualizating Hierarchical clusters, Fua et al. 1999
Approaches • Dimensional Reduction • Principle Component Analysis • Independent Component Analysis • Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting • Dimensional Subsetting • Dimensional Organization • Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000
Dimensional Embedding • Dimensional stacking divides data space into bins • Each N-D bin has a unique 2-D screen bin • Screen space recursively divided based on bin count for each dimension • Clusters and trends manifested as repeated patterns Source: Matt Ward, Multivariate Vis talk Sept 2000
Dimensional Embedding - not so easy Producing a good plot is hard • What Dimensions do you choose at what hierarchy? • How do you keep coordinates consistent? • How do you layout tiles on page with consistency? • Can we do this automatically? Trellis - an attempt by Rick Becker and Bill Cleveland Incorporated in to the S/S-PLUS statistical Package
Effective use of space Which graph is better? Government payrolls in 1931 [how to lie with stats, huff 93] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
Aspect Ratio - fill space with data Don’t worry about showing zero Yearly CO2 concentrations [Cleveland 85] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
Banking to 45 Degrees http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
Clearly mark scale breaks Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
Scale break vs. Log scale Both Increase Visual Resolution Log scale allows easy comparisons of all data Scale break is more difficult to compare across the break Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
Transforming Data for Graphing How well does the curve fit the data? Plot vertical distance from best fit curve Residual graph shows accuracy of fit Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005
A Trellis Example Lead Concentration vs. Setback Distance Given Day-of-the-Week, Week, and Height On the next slide is a trellis display of lead concentration against setback distance given day-of-the-week (thu-wed), week (1-3), and height (3 values). There are 63 panels arranged into 31 columns and 3 rows. Each row conditions on a different value of height; as we go from bottom to top, the heights increase. The panels in each row are in time order because the panels first cycle through the days of the week and then through the weeks. The display reveals much about the structure of the data. There is a strong interaction between height and setback distance. For the lowest height, lead decreases with setback. But for the middle value of height, lead typically first increases with setback and then decreases. For the highest height, lead occasionally has the increase-decrease pattern for about 1/3 of the days, most of them days with large concentrations, and is relatively stable for the remaining days. This behavior is consistent with air transport mechanisms. Lead is emitted at ground level from automobile tail pipes. The closest of the 9 monitors, the one with the lowest height and the closest setback, has the largest concentrations because it is close to the pollution source. From the source, the lead is carried laterally by the wind, spreading upward as it moves. This plume-like behavior can cause the concentrations to be relatively small at the higher monitors at the closest setback. Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html
A Trellis Example Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html
Tensor VisualizationHigh Dimensional Scientific Data Visualization Not Today
Some Interesting Web Sites • The best and worst of statistical graphs • http://www.math.yorku.ca/SCS/Gallery/ • Chris Healey’s Preattentive Vision Applet • http://www.csc.ncsu.edu/faculty/healey/PP/index.html#Preattentive • OpenDX Gallery • http://www.opendx.org/highlights.php • IVTK: An Information Visualization Toolkit • Ivtk.sourceforge.net • Information Visualization Repository • http://www.cs.umd.edu/hcil/InfovisRepository/index.shtml
Resources • Great sources for theory behind multivariate display and perception are • Bertin 1983 • Cleveland 1993 • Tufte 1983, 1990 • Colin Ware, 2000 • A couple of good papers are • Shneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations” • Marc Green, “Toward a Perceptual Science of Multidimensional Data Visualization: Bertin and Beyond”