VizDB: Visual Data Mining Tool for Large Databases

VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data

Data Mining Techniques • Implements several data mining techniques • Pixel-oriented Techniques (Spiral, Axes, and Grouping Techniques) • Parallel Coordinates • Stick Figures Exploration of unto a million data values

Concept • The basic idea for visualizing the data is to map the distances to colors and represent each data value by one or multiple colored pixels. • Interactivity is the key !

Requirement • Feedback required when query returns unexpected results • Interactivity allows immediate feedback from a modified query • Configurable tool, that allows various forms of data visualization techniques • Using the human vision system for pattern recognition

Basic Technique • Sort query data w.r.t. the relevance and map relevance factors to colors • Highest relevance factor in the center • Yellow-Green-Blue-Red-Black in decreasing order of relevance. • Separate window for each selection predicate in the query • Multiple windows make multi-dimensional visualization

Mapping 2-D To The Axes • Visualization of inherently 2D or 3D data is not dealt with in VizDB • Where no inherent 2D semantics of data exist, VizDB is a valuable tool. Use of two axes for two dimensions. Positive as well as negative values displayed. • Some space may be wasted .. (Why?)

Grouping • Each area is arranged in a rectangular spiral shape according to relevance factors • Coloring is similar to the previous method • Grouping allows data similar in one dimension to be grouped together. Data in multiple dimensions are represented as clusters of pixels • Good for larger dimensionality

Interactive Data Exploration • Dynamic Query Modification Techniques • Feedback on the results • Change in color means change in values that are “relevant” • Change in structure means overall distribution of data has changed • Sliders for discrete as well as continuous values • Initial Query is SQL or “Gradi”

Calibrations • Calculation of “relevance” factor can be calibrated by the user • Starting and ending values for various numeric data • Eg: Blood samples count

What about complex queries? • Multiple layers of windows for complex queries using nested AND and OR operators • Data that satisfies ALL joins is yellow. The rest is colored according to number of criteria met • Works well with the relational databases

Implementations • C++ with Motif using X Windows on HP 7xx • Currently being ported to Linux (I couldn’t get this working! )

Adding new techniques • More Info Viz. Techniques can be integrated with the system. • Latest version supports Parallel co-ordinates, Stick Figures, Pan and zoom techniques New Stuff !!

Applications • Molecular Biology - to find possible docking regions by identifying sets surface points with distinct characteristics. • Database of geographical data • Environmental Data • NASA Earth observation data

Future Work • Automatic generation of queries that correspond to data in specific regions (Select some data, and the SQL query that matches that data will get generated.. • Time series visualization Cool !!

Thank You The presentation slides are available at http://filebox.vt.edu/users/adatey/research/VizDB.ppt A small color picture that shows different techniques http://filebox.vt.edu/users/adatey/research/VisDBHandout.eps

VizDB: Visual Data Mining Tool for Large Databases

VizDB: Visual Data Mining Tool for Large Databases

Presentation Transcript