430 likes | 575 Views
VisDB : Database exploration using Multidimensional Visualization. Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich. Created By. Rohan Ladkhedkar Ajinkya Raulkar Vrushali Date Anuja Surgude. Contents. Introduction to VisDB Basic Idea of VisDB
E N D
VisDB: Database exploration using Multidimensional Visualization Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich
Created By • RohanLadkhedkar • AjinkyaRaulkar • Vrushali Date • AnujaSurgude
Contents • Introduction to VisDB • Basic Idea of VisDB • Techniques used • Basic Visualization • Mapping 2D to Axis • Grouping the Dimensions • Working • Hardware/Software • Future Scope • Conclusion
Introduction to VisDB Typical difficulties faced with large databases: • Finding a specific data • No knowledge about database systems, query language and data model • Intersection data spots • 1 to 1 queries provide multiple data items with no feedback
Introduction to VisDB • Sorting the data items according to user query. • Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query. • Also the resolution of current displays(1 to 3 million pixels) is an important consideration. • Interaction of the system with user.
Basic Idea of VisDB • Support Query Specification process by visually representing the result. • Restricts the visualized dimensions which are of no interest to users.
Basic Idea of VisDB • Each pixel of screen is used to visualize the data items resulting from a query. • Approximate results are determined using distance functions. • These distances are then combined to get relevance factor which is useful for mapping.
Distance Function • The distance between attribute and corresponding query value is determined. • Distance function used here are data type and application dependent. • In some cases, even for a single data type multiple distance functions can be used. • Calculating distance functions for • Number types(Integer) – Numerical difference. • Ordinal types(Grades) – domain specific distance functions • Nominal Types(Professionals) – Distance matrix
Combining Distances into Relevance Factor • Combine independently calculated distances of the different selection predicates. • But it should have a global meaning. • User interaction required. Obtain weighting factors (Wj, j Є 1, ……, #sp) as per order of importance from users. • Normalization of all distances. Linear transformation of the range [dmax,dmin] for each predicate e.g. (0,255)
Combining Distances into Relevance Factor • For combining the normalized distances we use numerical mean functions such as : 1. Weighted arithmetic mean for ‘AND’ – connected condition part. • Weighted geometric mean for ‘OR’- connected condition part. Relevance factor is inverse of distance value
Reducing the amount of data to be displayed • Adequate heuristics are required to: • Reduce amount of data • Determine data items whose distances are to be displayed. • Hence α-quantile is defined as lowest value ξα such that:
Techniques Used • 3 techniques are used • Basic Visualization Technique • Mapping two dimensions to the Axes • Grouping the dimensions for each data Item
1. Basic Visualization Technique • Sorts data according to relevance with respect to query. • Then maps the relevance factors to colors. • Sorting is needed to avoid sprinkled images (which are not clear to user). • Highest Relevance factors centered to middle of window • Approximate answers create a rectangular spiral around this region(100% correct answers are yellow in color).
1. Basic Visualization Technique • Color ranges from Yellow in middle to green, blue, red and lastly black • These ranges denote the distance from correct answers.
1. Basic Visualization Technique • Multidimensional Visualization - In this we generate a separate window for each selection predicate of the query.
Question 1: • 100% correct answers are denoted by which color in Basic Visualization Technique? • Red • Yellow • Green • White • Blue
Answer 1: • Correct answer: 2
2. Mapping Two Dimensions to Axes • Reasons for not pursuing 2D-3D visualizations although they are useful is because of • Limited Number of data items. • Systems already exist. • Improvement – Providing feedback on the direction of the distance into visualization.
2. Mapping Two Dimensions to Axes • Assign two dimensions to the axes • Arrange the relevance factor according to the direction of the distance. • For 1 dimension, arrangement is Negative distances to left, Positive distances to right, For other dimension Negative distances to bottom, Positive ones to top
Problems in this method • Corner of window would be completely empty. • Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presented • Maximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis.
Question 2: • In 1 Dimension Negative distances are arranged • 1) at the bottom • 2) to the right • 3) at the top • 4) to the left
Answer 2: • Correct answer: 4
3. Grouping the Dimensions for each Data Item • All dimensions for one data item are grouped together in one area. • Visualizations generated using this arrangement consists of only one window. • We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different. • 2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods.
Contd… • Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time. • But still provides more visualizations for data sets with larger dimensionality. • In other two techniques the pixels for each dimension of the data items are only related by their position.
Working • Divided into the Visualization portion on left and Query Modification on right. • In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods. • In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided.
Working contd.. • Different kind of sliders are there. • Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types) • Other parameters listed are • Number of results • Query range • Weighting factors • Data values for selected tuple • Data values corresponding to some selected color range
Working contd.. • Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range. • Normal Mode - System recalculates the visualization after each modification of query. • Auto-Recalculate Off mode – Queries are only recalculated on demand.
Question 3: • In which two sections is VisDB mainly divided?? • Visualization Portion • Grouping Dimentions • Query Modification • Coloration of Relevance factors
Answer 3: • Correct answer: 1 and 3
Question 4 • In which mode does the system recalculates the visualization after each modification of query? • Normal Mode • Auto Recalculate Mode • Visual Mode • None of the above.
Answer 4: • Correct answer: 1
Hardware/Software • Software used • C++ • MOTIF • Hardware used • X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items)
Future Scope • Automatic generation of queries that correspond to some specific region in one of the visualization windows. • Generate time series of visualizations corresponding to queries that are changed incrementally. • Applying to many different application domains each having its own parameters, distance functions, query requirements and so on.
Conclusion • This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display. • Provides valuable feedback in querying the database • Allows the user to find results which would other wise remain hidden in database.