1 / 28

SIMS 247 Lecture 4 Graphing Multivariate Information

SIMS 247 Lecture 4 Graphing Multivariate Information. January 29, 1998. Follow-up previous lecture. Docuverse: length of arc is proportional to number of subdirectories radius for a given arc is long enough to contain marks for all the files in the directory Nightingale’s “coxcomb”

Download Presentation

SIMS 247 Lecture 4 Graphing Multivariate Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMS 247 Lecture 4Graphing Multivariate Information January 29, 1998 Marti Hearst SIMS 247

  2. Follow-up previous lecture • Docuverse: • length of arc is proportional to number of subdirectories • radius for a given arc is long enough to contain marks for all the files in the directory • Nightingale’s “coxcomb” • keep arc length constant • vary radius length (proportional to sqrt(freq)) Marti Hearst SIMS 247

  3. Today: Multivariate Information • We see a 3D world • How do we handle more than 3 variables? • multi-functioning elements • Tufte examples • cinematography example • multiple views Marti Hearst SIMS 247

  4. Example Data Sets How do we handle 9 variables? • Our web access dataset • Factors involved in alcoholism • ALCOHOL • USE • AVAILABILITY • CONCERN ABOUT USE • COPING MECHANISMS • PERSONALITY MEASURES • EXTROVERSION • DISINHIBITION • OTHER • GENDER • GPA Marti Hearst SIMS 247

  5. Graphing Multivariate Information How do we handle cases with more than three variables? • Scatterplot matrices • Parallel coordinates • Multiple views • Overlay space and time • Interaction/animation across time Marti Hearst SIMS 247

  6. Multiple Variables: Scatterplot Matrices(from Wegman et al.) Marti Hearst SIMS 247

  7. Multiple Variables: Scatterplot Matrices(from Schall 95) Marti Hearst SIMS 247

  8. Multiple Views: Star Plot(Discussed in Feinberg 79. Works better with animation. Example taken from Behrans & Yu 95.) Marti Hearst SIMS 247

  9. Multiple Dimensions: Parallel Coordinates(earthquake data, color indicates longitude, y axis severity of earthquake, from Schall 95) Marti Hearst SIMS 247

  10. Multiple Dimensions: Multivariate Star Plot(from Behran & Yu 95) Marti Hearst SIMS 247

  11. Chernoff Faces • Assumption: people have built-in face recognizers • Map variables to features of a cartoon face • Example: eyes • location, separation, angle, shape, width • Example: entire face • area, shape, nose length, mouth location, smile curve • Originally tongue-in-cheek, but taken seriously • Sometimes seems to work for small numbers of points Marti Hearst SIMS 247

  12. Chernoff Example (Marchette) • Three groups of points • each drawn from a different distribution with 5 variables • First show scatter-plot matrix • Then graph with Chernoff faces • vary faces overall • vary eyes • vary mouth and eyebrows • Which seems to be most effective? Marti Hearst SIMS 247

  13. Chernoff Experiment (Marchette) Marti Hearst SIMS 247

  14. Chernoff Experiment (Marchette) Marti Hearst SIMS 247

  15. Chernoff Experiment (Marchette) Marti Hearst SIMS 247

  16. Chernoff Experiment (Marchette) Marti Hearst SIMS 247

  17. Overlaying Space and Time(Minard’s graph of Napolean’s march through Russia) Marti Hearst SIMS 247

  18. A Detective Story(Inselberg 97) • Domain: Manufacture of computer chips • Objectives: create batches with • high yield (X1) • high quality (X2) • Hypothesized cause of problem: • 9 types of defects (X3-X12) • Some physical properties (X13-X16) • Approach: • examine data for 473 batches • use interactive parallel coordinates Marti Hearst SIMS 247

  19. Multidimensional Detective • Long term objectives: • high quality, high yield • Logical approach given the hypothesis: • try to eliminate defects • First clue: • what patterns can be found among batches with high yield and quality? Marti Hearst SIMS 247

  20. Detectives aren’t intimidated! X1 seems to be normally distributed; X2 bipolar Marti Hearst SIMS 247

  21. High quality yields obtained despite defects good batches X15 breaks into two clusters (important physical property) some low X3 defect batches don’t appear here at least one good batch with defects Marti Hearst SIMS 247

  22. Low-defect batches are not highest quality! few defects low yield, low quality Marti Hearst SIMS 247

  23. Original plot shows defect X6 behaves differently; exclude it from the 9-out-of-10 defects constraint; the best batches return Marti Hearst SIMS 247

  24. Isolate the best batches.Conclusion: defects are necessary! The very best batch has X3 and X6 defects Ensure this is not an outlier -- look at top few batches. The same result is found. Marti Hearst SIMS 247

  25. How to graph web page traversals? Marti Hearst SIMS 247

  26. References for this Lecture • Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Yu, 1995 http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html • Feinberg, S. E. Graphical methods in statistics. American Statisticians, 33, 165-178, 1979 • Friendly, Michael, Gallery of Data Visualization. http://www.math.yorku.ca/SCS/Gallery • scan of Minard’s graph from Tufte 1983 • multivariate means comparison • Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Coordinates and the Grand Tour., Conference of the German Classification Society, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.html • Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, 1995. http://stat.umn.edu/~rcode/node3.html • Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Perspective, 18 (2), 1995. http://www.spss.com/cool/papers/diamondw.html • Marchette, David. An Investigation of Chernoff Faces for High Dimensional Data Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.html • Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Graphically. Journal of the American Statistical Association, 68, 361-368, 1973. Marti Hearst SIMS 247

  27. Next Time: Brushing and Linking • An interactive technique • Brushing: • pick out some points from one viewpoint • see how this effects other viewpoints • (Cleveland scatterplot matrix example) • Graphs must be linked together Marti Hearst SIMS 247

  28. Brushing and Linking Systems • VISAGE: Roth et. al • Attribute Explorer: Tweedie et. al • SpotFire (IVEE): Ahlberg et. al Marti Hearst SIMS 247

More Related