280 likes | 376 Views
SIMS 247 Lecture 4 Graphing Multivariate Information. January 29, 1998. Follow-up previous lecture. Docuverse: length of arc is proportional to number of subdirectories radius for a given arc is long enough to contain marks for all the files in the directory Nightingale’s “coxcomb”
E N D
SIMS 247 Lecture 4Graphing Multivariate Information January 29, 1998 Marti Hearst SIMS 247
Follow-up previous lecture • Docuverse: • length of arc is proportional to number of subdirectories • radius for a given arc is long enough to contain marks for all the files in the directory • Nightingale’s “coxcomb” • keep arc length constant • vary radius length (proportional to sqrt(freq)) Marti Hearst SIMS 247
Today: Multivariate Information • We see a 3D world • How do we handle more than 3 variables? • multi-functioning elements • Tufte examples • cinematography example • multiple views Marti Hearst SIMS 247
Example Data Sets How do we handle 9 variables? • Our web access dataset • Factors involved in alcoholism • ALCOHOL • USE • AVAILABILITY • CONCERN ABOUT USE • COPING MECHANISMS • PERSONALITY MEASURES • EXTROVERSION • DISINHIBITION • OTHER • GENDER • GPA Marti Hearst SIMS 247
Graphing Multivariate Information How do we handle cases with more than three variables? • Scatterplot matrices • Parallel coordinates • Multiple views • Overlay space and time • Interaction/animation across time Marti Hearst SIMS 247
Multiple Variables: Scatterplot Matrices(from Wegman et al.) Marti Hearst SIMS 247
Multiple Variables: Scatterplot Matrices(from Schall 95) Marti Hearst SIMS 247
Multiple Views: Star Plot(Discussed in Feinberg 79. Works better with animation. Example taken from Behrans & Yu 95.) Marti Hearst SIMS 247
Multiple Dimensions: Parallel Coordinates(earthquake data, color indicates longitude, y axis severity of earthquake, from Schall 95) Marti Hearst SIMS 247
Multiple Dimensions: Multivariate Star Plot(from Behran & Yu 95) Marti Hearst SIMS 247
Chernoff Faces • Assumption: people have built-in face recognizers • Map variables to features of a cartoon face • Example: eyes • location, separation, angle, shape, width • Example: entire face • area, shape, nose length, mouth location, smile curve • Originally tongue-in-cheek, but taken seriously • Sometimes seems to work for small numbers of points Marti Hearst SIMS 247
Chernoff Example (Marchette) • Three groups of points • each drawn from a different distribution with 5 variables • First show scatter-plot matrix • Then graph with Chernoff faces • vary faces overall • vary eyes • vary mouth and eyebrows • Which seems to be most effective? Marti Hearst SIMS 247
Chernoff Experiment (Marchette) Marti Hearst SIMS 247
Chernoff Experiment (Marchette) Marti Hearst SIMS 247
Chernoff Experiment (Marchette) Marti Hearst SIMS 247
Chernoff Experiment (Marchette) Marti Hearst SIMS 247
Overlaying Space and Time(Minard’s graph of Napolean’s march through Russia) Marti Hearst SIMS 247
A Detective Story(Inselberg 97) • Domain: Manufacture of computer chips • Objectives: create batches with • high yield (X1) • high quality (X2) • Hypothesized cause of problem: • 9 types of defects (X3-X12) • Some physical properties (X13-X16) • Approach: • examine data for 473 batches • use interactive parallel coordinates Marti Hearst SIMS 247
Multidimensional Detective • Long term objectives: • high quality, high yield • Logical approach given the hypothesis: • try to eliminate defects • First clue: • what patterns can be found among batches with high yield and quality? Marti Hearst SIMS 247
Detectives aren’t intimidated! X1 seems to be normally distributed; X2 bipolar Marti Hearst SIMS 247
High quality yields obtained despite defects good batches X15 breaks into two clusters (important physical property) some low X3 defect batches don’t appear here at least one good batch with defects Marti Hearst SIMS 247
Low-defect batches are not highest quality! few defects low yield, low quality Marti Hearst SIMS 247
Original plot shows defect X6 behaves differently; exclude it from the 9-out-of-10 defects constraint; the best batches return Marti Hearst SIMS 247
Isolate the best batches.Conclusion: defects are necessary! The very best batch has X3 and X6 defects Ensure this is not an outlier -- look at top few batches. The same result is found. Marti Hearst SIMS 247
How to graph web page traversals? Marti Hearst SIMS 247
References for this Lecture • Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Yu, 1995 http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html • Feinberg, S. E. Graphical methods in statistics. American Statisticians, 33, 165-178, 1979 • Friendly, Michael, Gallery of Data Visualization. http://www.math.yorku.ca/SCS/Gallery • scan of Minard’s graph from Tufte 1983 • multivariate means comparison • Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Coordinates and the Grand Tour., Conference of the German Classification Society, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.html • Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, 1995. http://stat.umn.edu/~rcode/node3.html • Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Perspective, 18 (2), 1995. http://www.spss.com/cool/papers/diamondw.html • Marchette, David. An Investigation of Chernoff Faces for High Dimensional Data Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.html • Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Graphically. Journal of the American Statistical Association, 68, 361-368, 1973. Marti Hearst SIMS 247
Next Time: Brushing and Linking • An interactive technique • Brushing: • pick out some points from one viewpoint • see how this effects other viewpoints • (Cleveland scatterplot matrix example) • Graphs must be linked together Marti Hearst SIMS 247
Brushing and Linking Systems • VISAGE: Roth et. al • Attribute Explorer: Tweedie et. al • SpotFire (IVEE): Ahlberg et. al Marti Hearst SIMS 247