410 likes | 801 Views
Data Visualization. Prepared for IST597C Ziming Zhuang School of Information Sciences and Technology The Pennsylvania State University November 2005. Outline. What … Why … Concepts How to … - Texts - Web - Images Challenges and to-dos References. Visualization is ….
E N D
Data Visualization Prepared for IST597C Ziming Zhuang School of Information Sciences and Technology The Pennsylvania State University November 2005
Outline • What … • Why … • Concepts • How to … - Texts - Web - Images • Challenges and to-dos • References Zhuang 11/2005
Visualization is … • an interdisciplinary area built upon database, human-computer interaction, cognitive science, … • uses interactive visual representations of abstract data to amplify cognition. [Shneiderman et al. ’99] • involves selecting, transforming, and representing abstract data in a form that facilitates human interaction for exploration and understanding. [Cugini et al. ’96] Zhuang 11/2005
Why visualize? - Early endeavor: scientific visualization [McCormick et al, ’86], “to improve the ability of people to understand the data they work with.” - Representation “A good picture's worth a thousand words.” • Graphical manipulation graphical query language, … Zhuang 11/2005
Why visualize? (cont.) • Visualization helps users comprehend large quantities of data. • Visual attributes can present abstract representations of data. • Relationships among displayed entities become apparent. • Graphical techniques allow more direct intuitive interactions with the entities of interest. - [Cugini et al. ’96] Zhuang 11/2005
Data Dimensions • Assume for every x and y we have temperature t and pressure p, we can do: f(x, y) -> (t, p) f1(x, y) -> t, f2(x, y) -> p f3(x, y, t) -> 0 or 1, f4(x, y, p) -> 0 or 1 f5(x, y, t, p) -> 0 or 1 • The key is that the mapping must go to a single value (or vector), e.g. f(x, t) -> 0 or more values of elements with position x and temp t, therefore losing information (e.g. hidden surfaces in projection). Zhuang 11/2005
Graph Entities and Attributes • Entities: point, line, polyline, glyph, surface, solid, image, text • Attributes: color/intensity, location, style, size, relative position/motion Zhuang 11/2005
Rationale of visualization • Gigabit bandwidth of the visual cortex system permits much faster perception of geometric and spatial relationships than any other mode [Consens et al. 1994] • Human eyes are more sensitive to such intuitive representations Zhuang 11/2005
Rationale of visualization • Context varies our sensitivity • In increasing inaccuracy [Ward et al.] 1. Position along a common scale 2. Position along identical, non-aligned scales 3. Length 4. Angle/slope 5. Area 6. Volume 7. Hue/saturation/intensity (informally derived) Zhuang 11/2005
Basic Visualization Methods • Rendering – what to show in a plot • Manipulation – what to do within plots • Linking – what information to share between plots [Sutherland et al. 2000] Zhuang 11/2005
Methods – data rendering • Use scaling and offset to fit in range • Use derived values (residuals, logs) to emphasize changes • Use projections etc. to compress information • Use random jiggling to separate overlaps • Use multiple views to handle hidden relations or high dimensions • Use effective grids, keys and labels to aid understanding Zhuang 11/2005
The PipeLine Method • Proposed in [Buja et al. 1988] Data Model Visualization Geometry Render Zhuang 11/2005
Methods – data manipulating • Dynamically adjust mapping • Tour data by varying views • Deleting to de-cluster / eliminate clusters • Brushing/Highlighting to see correspondence in multiple views • Zoom in to focus attention; zoom out to show context • Panning / spinning to explore neighborhoods Zhuang 11/2005
Visualizing text database • Motivation very large corpus; more sensitive to structure, similarity, and connectivity relationship; provide necessary context. • Methods Graphical browsing/query interface Zhuang 11/2005
Visualizing text db • InfoGrid [Rao et al ’92] Zhuang 11/2005
The Hy+ System • [Consens et al. 1994] Hy+ supports the visual presentation of structured data in the form of hygraphs; supports a visual query language GraphLog; supports filtering of data to reduce visual complexity and building new relationships among the data similar to creating new db views. Zhuang 11/2005
The Hy+ System • Two fundamental capabilities - define new relationships using queries (the derived data or view can be visually presented): define queries - selective data visualization (filter relevant data and control the level of details): filter queries Zhuang 11/2005
Visualizing Text Search • [Baeza-Yates ’96] Goal: visualizing large number of answers in text db. - visual browsing - document visualization Zhuang 11/2005
Visualizing Text Search • [Baeza-Yates ’96] - visualizing query - visualizing answers Zhuang 11/2005
The VIBE System • [Olsen et al. ’93] Location of a document icon is determined by the ratio of similarities between the documents and the POIs. si is the similarity between a given document and POI i. pi is the position vector for POI i. example Zhuang 11/2005
Visualizing Search Results • [Nowell et al. '97] - Goal: to allow users to explore patterns in the large collection of search results. - two dimensional patterns (x- and y- axis) - screenshot Zhuang 11/2005
Visualizing the Web • [Hasan et al. ’96] Maintained a connectivity db; the Hy+ system was interfaced with a web browser. The GraphLog query language restricted the set of docs displayed in the view by considering doc properties, matches in url / anchor text, etc. History was stored (“history graph”). Zhuang 11/2005
Visualizing the Web • Applications - Touchgraph: 12 if 1 doesn’t work - NIRVE (The NIST Information Retrieval Visualization Engine) [Cugini et al. ’97] concept mappingdocument space - PRIZE [Cugini et al. ’96] spiral viewaxis viewnearest-neighbor Zhuang 11/2005
Visualizing Image Retrieval • Related to our project! • Problems for general Web image search engines (adapted from [Upstill et al. ’01]): - heterogeneity: inconsistence in results - no transparency: why images are retrieved? hard to refine queries - no relationships: grid layout is meaningless - coarse grained interaction: search again, or find similar Zhuang 11/2005
Example of the problems • Example • This example image grid is generated for the query “clown, circus, tent". • Similar images are not adjacent in the grid. • The vector evidence is lost when compressing the ranking into a grid. Zhuang 11/2005
Visualizing Image Retrieval • The VISR system, based on the spring model [Olsen et al. ’93] Zhuang 11/2005
Visualizing Image Retrieval Zhuang 11/2005
Visualizing conventional db • Polaris Project - Initially developed at Stanford; now a commercial tool - Goals: interactive analysis and exploration; simple and consistent interface - interface Zhuang 11/2005
Visualizing conventional db • Polaris Project – multiscale visualizing using data cubes [Stolte ’02] Zhuang 11/2005
Visualizing conventional db • Data Cube – data abstraction • Combine with visual abstraction: achieve multiscale. Zhuang 11/2005
Visualizing conventional db • OpenDX website - initially developed at IBM • Chernoff Faces [Chernoff,1973] - a technique to illustrate trends in multi-dimensional data - different data dimensions were mapped to different facial features - especially effective because data is related to facial features which we are used to differentiating between. - websiteexample Zhuang 11/2005
Challenges & “to-do”s • Standardized metrics? Only see “visualization entropy” and “visualization precision” in Oslen’s paper. Text-based metrics are still used and/or user studies. • Cognitive modeling • High dimensional representation and manipulation require faster processors and bigger RAM. Zhuang 11/2005
References • McCormick et al. Visualization in Scientific Computing. SIGGraph Computer Graphics 21:6, 1987. • Rao et al. The information Grid: A framework for information retrieval and retrieval-centered applications. UIST ’92 • Hasan et al. Applying database visualization to the World Wide Web. SIGMOD Record 25:4, 1996. • Consens et al. Architecture and Applications of the Hy+ Visualization System. IBM Systems Journal 33:3, 1994. • Baeza-Yates. Visualization of Large Answers in Text Databases. • Cugini, Piatko, Laskowski, "Interactive 3D Visualization for Document Retrieval", Proceedings of the Workshop on New Paradigms in Information Visualization and Manipulation , CIKM '96, November 1996. • Cugini, Laskowski, Piatko, "Document Clustering in Concept Space: The NIST Information Retrieval Visualization Engine (NIRVE)", CODATA Euro-American Workshop on Visualization of Information and Data, Paris, France, June 1997. Zhuang 11/2005
References • Novell et al. Exploring Search Results with Envision. CHI ’97. • Olsen et al. Visualization of a document collection: The VIBE system. Information Processing and Management, 29:1, 1993. • Upstill et al. Visual clustering of image search results. In Proc. SPIE Vol. 4302, 2001. • K. Olsen, R. Korfhage, M. Spring, K. Sochats, and J. Williams, Visualization of a Document Collection with Implicit and Explicit Links: The VIBE System," The Scandinavian Journal of Information Systems , August 1993. • Stolte, C. Multiscale Visualization Using Data Cubes. The Eighth IEEE Symposium on Information Visualization, October 2002. • Buja, A., Asimov, D., Hurley, C. & McDonald, J. A. (1988), Elements of a Viewing Pipeline for Data Analysis, in W. S. Cleveland & M. E. McGill, eds, 'Dynamic Graphics for Statistics', Wadsworth, Monterey, CA, pp. 277-308. • Sutherland, P. Rossini, A. Lumley, T., Lewin-Koh, N., Cook, D., Cox, Z. ORCA: A Visualization Toolkit for High-Dimensional Data. NRCSE Technical Report Series No. 046. May 18, 2000. • Herman Chernoff, "The use of faces to represent points in k-dimensional space graphically," Journal of American Statistics Association, v68, 361-368 (1973). Zhuang 11/2005
Thank You • Questions and comments? Zhuang 11/2005