1 / 61

Visual Data Mining: An Overview

Visual Data Mining: An Overview. What is Visual Data Mining? Survey of techniques Data Visualization Visualizing Data Mining Results Visual Data Mining. What Is Visual Data Mining?.

slaytona
Download Presentation

Visual Data Mining: An Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Data Mining: An Overview • What is Visual Data Mining? • Survey of techniques • Data Visualization • Visualizing Data Mining Results • Visual Data Mining

  2. What Is Visual Data Mining? • Visual data mining “discovers implicit and useful knowledge from large data sets using data and/or knowledge visualization techniques” • Data visualization + Data mining techniques

  3. Why Visual Data Mining? • Advantages of human visual system • Highly parallel processor • Sophisticated reasoning engine • Large knowledge base • Can be used to comprehend data distributions, patterns, clusters, and outliers

  4. Why Not Only Visual Data Mining? • Disadvantages of human visual system • Needs training • Not automated • Intrinsic bias • Limit of about 106 or 107 observations (Wegman 1995) • Power of integration with analytical methods

  5. Scope of Visual Data Mining • Visualization: Use of computer graphics to create visual images which aid in the understanding of complex, often massive representations of data • Visual Data Mining: The process of discovering implicit but useful knowledge from large data sets using visualization techniques Human Computer Interfaces Computer Graphics Multimedia Systems High Performance Computing Pattern Recognition

  6. Purpose of Visualization • Gain insight into an information space by mapping data onto graphical primitives • Provide qualitative overview of large data sets • Search for patterns, trends, structure, irregularities, relationships among data • Help find interesting regions and suitable parameters for further quantitative analysis • Provide a visual proof of computer representations derived

  7. Visual Data Mining & Data Visualization • Integration of visualization and data mining • data visualization • data mining result visualization • data mining process visualization • interactive visual data mining • Data visualization • Data in a database or data warehouse can be viewed • at different levels of abstraction • as different combinations of attributes or dimensions • Data can be presented in various visual forms

  8. Abilities of Humans and Computers

  9. Visual Mining vs. Scientific Vis. & Graphics • Scientific Visualization • Often visualize physical model, low dimensionality • Graphics • More concerned with how to render (draw) rather than what to render

  10. Data Visualization • View data in database or data warehouse • User may control • Different levels of details • Subset of attributes • Drawn using boxplots, histograms, polylines, etc.

  11. Historical Overview of Exploratory Data Visualization Techniques(cf. [WB 95]) • Pioneering works of Tufte [Tuf 83, Tuf 90] and Bertin [Ber 81] focus on • Visualization of data with inherent 2D-/3D-semantics • General rules for layout, color composition, attribute mapping, etc. • Development of visualization techniques for different types of data with an underlying physical model • Geographic data, CAD data, flow data, image data, voxel data, etc. • Development of visualization techniques for arbitrary multidimensional data (w.o. an underlying physical model) • Applicable to databases and other information resources

  12. Dimensions of Exploratory Data Visualization

  13. Classification of Data Visualization Techniques • Geometric Techniques: • Scatterplots, Landscapes, Projection Pursuit, Prosection Views, Hyperslice, ParallelCoordinates... • Icon-based Techniques: • Chernoff Faces, Stick Figures, Shape-Coding, Color Icons, TileBars,... • Pixel-oriented Techniques: • Recursive Pattern Technique, Circle Segments Technique, Spiral- & Axes-Techniques,... • Hierarchical Techniques: • Dimensional Stacking, Worlds-within-Worlds,Treemap, Cone Trees, InfoCube,... • Graph-Based Techniques: • Basic Graphs (Straight-Line, Polyline, Curved-Line,...) • Specific Graphs (e.g., DAG, Symmetric, Cluster,...) • Systems (e.g., Tom Sawyer, Hy+, SeeNet, Narcissus,...) • Hybrid Techniques: arbitrary combinations from above

  14. Distortion & Dynamic/Interaction Techniques • Distortion Techniques • Simple Distortion (e.g. Perspective Wall, Bifocal Lenses, TableLens, Graphical Fisheye Views,...) • Complex Distortion (e.g. Hyperbolic Repr. Hyperbox,...) • Dynamic/Interaction Techniques • Data-to-Visualization Mapping (e.g. Auto Visual, S Plus, XGobi, IVEE,...) • Projections: (e.g. GrandTour, S Plus, XGobi,...) • Filtering (Selection, Querying) (e.g. MagicLens, Filter/Flow Queries, InfoCrystal,...) • Linking & Brushing (e.g. Xmdv-Tool, XGobi, DataDesk,...) • Zooming (e.g. PAD++, IVEE, DataSpace,...) • Detail on Demand (e.g. IVEE, TableLens, MagicLens, VisDB,...)

  15. Visual Survey • Data visualization techniques • Scatterplot Matrices, Landscapes, Parallel Coordinates • Icon-based, Dimensional Stacking, Treemaps

  16. Direct Visualization Ribbons with Twists Based on Vorticity

  17. Geometric Techniques • Basic Idea • Visualization of geometric transformations and projections of the data • Methods • Landscapes [Wis 95] • Projection Pursuit Techniques [Hub 85] (a techniques for finding meaningful projections of multidimensional data) • Scatterplot-Matrices [And 72, Cle 93] • Prosection Views [FB 94, STDS 95] • Hyperslice [WL 93] • Parallel Coordinates [Ins 85, ID 90]

  18. Scatterplot-Matrices [Cleveland 93] Used byermission of M. Ward, Worcester PolytechnicInstitute matrix of scatterplots (x-y-diagrams) of the k-dimensional data [total of (k2/2-k) scatterplots]

  19. Landscapes [Wis 95] • Visualization of the data as perspective landscape • The data needs to be transformed into a (possibly artificial) 2D spatial representation which preserves the characteristics of the data news articlesvisualized asa landscape Used by permission of B. Wright, Visible Decisions Inc.

  20. Parallel Coordinates [Ins 85, ID 90] • n equidistant axes which are parallel to one of the screen axes and correspond to the attributes • the axes are scaled to the [minimum, maximum]―range of the corresponding attribute • every data item corresponds to a polygonal line which intersects each of the axes at the point which corresponds to the value for the attribute

  21. Parallel Coordinates

  22. Icon-Based Techniques • Basic Idea • Visualization of the data values as features of icons • Overview • Chernoff-Faces [Che 73, Tuf 83] • Stick Figures [Pic 70, PG 88] • Shape Coding [Bed 90] • Color Icons [Lev 91, KK 94] • TileBars [Hea 95](use of small icons representing the relevance feature vectors in document retrieval)

  23. Stick Figures census data showing age, income, sex, education, etc. used by permission of G. Grinstein, University of Massachusettes at Lowell

  24. Hierarchical Techniques • Basic Idea:  Visualization of the data using a hierarchical partitioning into subspaces. • Overview • Dimensional Stacking [LWW 90] • Worlds-within-Worlds [FB 90a/b] • Treemap [Shn 92, Joh 93] • Cone Trees [RMC 91] • InfoCube [RG 93]

  25. Dimensional Stacking [LWW 90] • partitioning of the n-dimensional attribute space in 2-dimensional subspaces which are ‘stacked’ into each other • partitioning of the attribute value ranges into classes the important attributes should be used on the outer levels • adequate especially for data with ordinal attributes of low cardinality

  26. Dimensional Stacking • Used by permission of M. Ward, Worcester Polytechnic Institute Visualization of oil mining data with longitude and latitude mapped to the outer x-, y-axes and ore grade and depth mapped to the inner x-, y-axes

  27. Dimensional Stacking • Disadvantages: • Difficult to display more than nine dimensions • Important to map dimensions appropriately • May be difficult to understand visualizations at first

  28. Treemap [JS 91, Shn 92, Joh 93] • Screen-filling method which uses a hierarchical partitioning of the screen into regions depending on the attribute values • The x- and y-dimension of the screen are partitioned alternately according to the attribute values (classes) MSR Netscan image:

  29. Treemap of a File System (Schneiderman)

  30. Treemaps • The attributes used for the partitioning and their ordering are user-defined (the most important attributes should be used first) • The color of the regions may correspond to an additional attribute • Suitable to get an overview over large amounts of hierarchical data (e.g., file system) and for data with multiple ordinal attributes (e.g., census data)

  31. Data Mining Result Visualization • Presentation of the results or knowledge obtained from data mining in visual forms • Examples • Scatter plots and boxplots (obtained from descriptive data mining) • Decision trees • Association rules • Clusters • Outliers • Generalized rules • Text mining

  32. Boxplots from Statsoft: Multiple Variable Combinations

  33. Visualization of Data Mining Results in SAS Enterprise Miner: Scatter Plots

  34. Visualization of Association Rules in SGI/MineSet 3.0

  35. Visualization of Decision Tree in SGI/MineSet 3.0

  36. Vizualization of Decision Trees

  37. Visualization of Cluster GroupingIBM Intelligent Miner

  38. Association Rules (MineSet) • LHS and RHS items are mapped to x-, y-axis • Confidence, support correspond to height of the bar or disc, respectively • Interestingness is mapped to Color

  39. MineSet: Association Rules

  40. Association Ball Graph (DBMiner) • Items are visualized as balls • Arrows indicate rule implication • Size represents support

  41. Classification (SAS EM [SAS 01]) • Color corresponds to relative frequency of a class in a node • Branch line thickness is proportional to the square root of the objects Tree Viewer

  42. Cluster Analysis(H-BLOB: Hierarchical BLOB) [SBG 00] Form blobs (implicit surfaces) Cluster Form ellipsoids

  43. H-BLOB

  44. Text Mining (ThemeRiver [WCF+ 00]) • Visualization of thematic Changes in documents • Vertical distance indicates collective strength of the themes

  45. Data Mining Process Visualization • Presentation of the various processes of data mining in visual forms so that users can see the flow of data cleaning, integration, preprocessing, mining • Data extraction process • Where the data is extracted • How the data is cleaned, integrated, preprocessed, and mined • Method selected for data mining • Where the results are stored • How they may be viewed

  46. Visualization of Data Mining Processes by Clementine See your solution discovery process clearly Understand variations with visualized data

  47. Interactive Visual Data Mining • Using visualization tools in the data mining process to help users make smart data mining decisions • Example • Display the data distribution in a set of attributes using colored sectors or columns (depending on whether the whole space is represented by either a circle or a set of columns) • Use the display to which sector should first be selected for classification and where a good split point for this sector may be

  48. Visual data mining • Projection Pursuits • (Class) Tours [Dhillon et al. ’98] • Visual Classification [Ankerst et al. KDD ’99]

  49. Projection Pursuits • Exploratory projection pursuit: • Goal: reduce dimensionality • Define “interestingness” index to each possible projection of a data set • Maximize this index, project linearly • Not always possible/useful

More Related