1 / 25

Visualization Techniques for Multivariate Discrete and Continuous Data

Visualization Techniques for Multivariate Discrete and Continuous Data. March 4, 2005 Rachael Brady. Multivariate Data Types. In general, each point has many attributes and/or measurements Type 1: measurements are continuous in nature, and combining dimensions might make sense

kiara
Download Presentation

Visualization Techniques for Multivariate Discrete and Continuous Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady

  2. Multivariate Data Types In general, each point has many attributes and/or measurements • Type 1: measurements are continuous in nature, and combining dimensions might make sense • Weather data - for each x, y, z location we have water density (scalar), temperature (scalar), wind velocity (vector), air pressure (scalar) • Type 2: data is discrete, more like attribute list, and cannot in general be combined • Baseball statistics - for each player we have at bats, walks, hits, doubles, homeruns, RBIs. • Populations - eye color of residents in NC, income level, voting record

  3. Approaches • Dimensional Reduction • Principle Component Analysis • Independent Component Analysis • Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting • Dimensional Subsetting • Dimensional Organization • Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000

  4. Dimensional Subsetting - Scatter Plots • Invoke the concept of small multiples • Show all pair-wise dimensions in a matrix • Easily see clusters, trends and correlations • Problem: How do you see a trend that requires 2 or more dependent variables? Source: Matt Ward, Multivariate Vis talk Sept 2000

  5. Dimensional Organization • Show each variable with an explicit visual representation • Spatial • Shape • Color • Size • Orientation • Texture The combination of these visual variables can produce information that “pops out”, but it is not additive Images: Chris Healey

  6. Dimensional Organization - Glyphs (show star glyph demo) Image: Matt Ward, Multivariate Vis talk Sept 2000

  7. Dimensional Organization - Parrallel Coords • Parallel Coordinates creates parallel, rather than orthogonal, dimensions. • Data point corresponds to polyline across axes • Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes Show Parrallel Coords Demo Source: Matt Ward, Multivariate Vis talk Sept 2000

  8. Parrallel Coords - Useful? Source: http://www.ccs.neu.edu/home/mattsp/

  9. Parrallel Coords - Useful?

  10. Parrallel Coords - Extended Visualizating Hierarchical clusters, Fua et al. 1999

  11. Approaches • Dimensional Reduction • Principle Component Analysis • Independent Component Analysis • Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting • Dimensional Subsetting • Dimensional Organization • Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000

  12. Dimensional Embedding • Dimensional stacking divides data space into bins • Each N-D bin has a unique 2-D screen bin • Screen space recursively divided based on bin count for each dimension • Clusters and trends manifested as repeated patterns Source: Matt Ward, Multivariate Vis talk Sept 2000

  13. Dimensional Embedding - not so easy Producing a good plot is hard • What Dimensions do you choose at what hierarchy? • How do you keep coordinates consistent? • How do you layout tiles on page with consistency? • Can we do this automatically? Trellis - an attempt by Rick Becker and Bill Cleveland Incorporated in to the S/S-PLUS statistical Package

  14. A Digression into Plot design…

  15. Effective use of space Which graph is better? Government payrolls in 1931 [how to lie with stats, huff 93] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  16. Aspect Ratio - fill space with data Don’t worry about showing zero Yearly CO2 concentrations [Cleveland 85] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  17. Banking to 45 Degrees http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  18. Clearly mark scale breaks Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  19. Scale break vs. Log scale Both Increase Visual Resolution Log scale allows easy comparisons of all data Scale break is more difficult to compare across the break Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  20. Transforming Data for Graphing How well does the curve fit the data? Plot vertical distance from best fit curve Residual graph shows accuracy of fit Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

  21. A Trellis Example Lead Concentration vs. Setback Distance Given Day-of-the-Week, Week, and Height On the next slide is a trellis display of lead concentration against setback distance given day-of-the-week (thu-wed), week (1-3), and height (3 values). There are 63 panels arranged into 31 columns and 3 rows. Each row conditions on a different value of height; as we go from bottom to top, the heights increase. The panels in each row are in time order because the panels first cycle through the days of the week and then through the weeks. The display reveals much about the structure of the data. There is a strong interaction between height and setback distance. For the lowest height, lead decreases with setback. But for the middle value of height, lead typically first increases with setback and then decreases. For the highest height, lead occasionally has the increase-decrease pattern for about 1/3 of the days, most of them days with large concentrations, and is relatively stable for the remaining days. This behavior is consistent with air transport mechanisms. Lead is emitted at ground level from automobile tail pipes. The closest of the 9 monitors, the one with the lowest height and the closest setback, has the largest concentrations because it is close to the pollution source. From the source, the lead is carried laterally by the wind, spreading upward as it moves. This plume-like behavior can cause the concentrations to be relatively small at the higher monitors at the closest setback. Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html

  22. A Trellis Example Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html

  23. Tensor VisualizationHigh Dimensional Scientific Data Visualization Not Today

  24. Some Interesting Web Sites • The best and worst of statistical graphs • http://www.math.yorku.ca/SCS/Gallery/ • Chris Healey’s Preattentive Vision Applet • http://www.csc.ncsu.edu/faculty/healey/PP/index.html#Preattentive • OpenDX Gallery • http://www.opendx.org/highlights.php • IVTK: An Information Visualization Toolkit • Ivtk.sourceforge.net • Information Visualization Repository • http://www.cs.umd.edu/hcil/InfovisRepository/index.shtml

  25. Resources • Great sources for theory behind multivariate display and perception are • Bertin 1983 • Cleveland 1993 • Tufte 1983, 1990 • Colin Ware, 2000 • A couple of good papers are • Shneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations” • Marc Green, “Toward a Perceptual Science of Multidimensional Data Visualization: Bertin and Beyond”

More Related