360 likes | 379 Views
Research Report. Design and Evaluation of Incremental Data Structures and Algorithms for Dynamic Query Interfaces. Institution & Authors. University of Maryland Department of Computer Science Human-Computer Interaction Laboratory http://www.cs.umd.edu/projects/hcil Authors Egemen Tanin
E N D
Research Report Design and Evaluation of Incremental Data Structures and Algorithms for Dynamic Query Interfaces
Institution & Authors • University of Maryland • Department of Computer Science • Human-Computer Interaction Laboratory • http://www.cs.umd.edu/projects/hcil • Authors • Egemen Tanin • Richard Beigel • Ben Shneiderman info vis - spr 2001
Abstract • Dynamic query interfaces (DQIs) are a recently developed database access mechanism that provides continuous real-time feedback to the user during query formulation. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new incremental approach to DQI algorithms and display updates that work well with large databases, both in theory and in practice. info vis - spr 2001
Outline Dynamic Querying The Incremental Approach Data Structures & Algorithms Theoretical Complexity Experiments Conclusions Future Work Keywords Data Structure Algorithm Database User Interface Information Visualization Direct Manipulation Dynamic Query Outline & Keywords info vis - spr 2001
Unlike textual query languages such as SQL, dynamic query interfaces (DQIs) are graphical Continuous feedback to user as the query is being formulated Tightly coupled: as the hit set varies all widgets are updated to show the hit set’s bounding rectangle “Details on demand” Query Input: Widgets Range sliders Alphanumeric sliders Toggles Checkboxes Query Output: Graphical Display Starfield Bars Charts Dynamic Querying info vis - spr 2001
In DQIs Queries formed incrementally Intermediate result visualization Sliders Conjunctions Constraints Efficient algorithms and supporting data structures required Incremental query formulation paradigm The Incremental Approach info vis - spr 2001
Active Subset Of limited size, stored in main memory Time, not space is constraint in DQI algorithms Auxiliary data structures Augment active subset with data structures that facilitate continuous querying Response time needed of ~0.1 seconds for continuous operations Reprocessing Auxiliary data structures are only reconstructed when user clicks on a widget 1 second (or less) delay for recomputation Incremental Display Slight changes in query tend to cause slight changes in output Compute and display differences for continuous updates Incremental Approach Innovations for Efficiency info vis - spr 2001
Query Previewer DQI algorithms to be used in tandem with a query previewer Allows user to browse a huge database and select a manageably small subset to scan Control Structure Selected subset passed from query previewer to DQI Bounding rectangle determines extremes for each attribute When user action extends beyond subset, control is passed back Data Structures & Algorithms i info vis - spr 2001
Major Operations Setup Selection Querying Setup when query previewer passes control to DQI Initial display is generated from active subset Copies and scales each attribute to the range [1,p] where p is number of pixels in attribute’s range slider Happens infrequently, more time allotted for this task Data Structures & Algorithms ii info vis - spr 2001
Selection When user clicks on a range slider Algorithm calculates auxiliary data structures Depends on currently selected attribute and current ranges for other attributes 1 second response required or users get annoyed Can spend memory to optimize, if needed Querying Occurs continuously as the user drags the mouse to update a slider Each changed mouse position is a single query ~ 0.1 second response required DQI computes maximum hit set during selection DQI computes information needed for redisplay during selection Data Structures & Algorithms iii info vis - spr 2001
Compute Max Hit Set Determined by extreme values for attribute and current ranges for other attributes Partitions max hit set into p buckets, one for each user-specificable value for the current attribute Store each bucket and all left-to-right partial sums of these sizes Linear-time counting sort Data Structures & Algorithms iv info vis - spr 2001
Data Structures & Algorithms v • Compute Redisplay Info • Facilitate computation of histogram and tight coupling of range sliders • Two dimensional array (size p2) maintained • Allows determination of ranges for all other sliders in constant time • Scan buckets for old value vs. new value • Display update in time linear with number of changes info vis - spr 2001
r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set Active Subset (ra b) bytes Rescaled Active Subset O(ra) bytes Bucket Partition O(ra) bytes Data structures for tight coupling O(ap) bytes Data structures range histograms O(ap2) bytes Starfield f bytes Theoretical Complexity i info vis - spr 2001
r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set Setup Time O (ra b) Theoretical Complexity ii info vis - spr 2001
r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set Selection Time Determine max hit set O (ra) Sort max hit set O (m) Compute auxiliary data structures for tight coupling O (ap + ma) Compute auxiliary data structures for histograms O (ap2 + ma) Total Selection Time O (a (r + m + p2)) = O (a (r+ p2)) Theoretical Complexity iii info vis - spr 2001
r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set Querying Time Tight coupling O (a ) Computing Histograms O (a p) Starfield Update O (u ) Total Querying Time O (a p + u ) Theoretical Complexity iii info vis - spr 2001
Preliminary experiments show that the incremental approach can deal with active subset of 100,000 records with 10 attributes each Film Finder could handle database 10,000 records with 10 attributes Improvement by one order magnitude Experimental Environment Experiments and Results Experimental Complexity Experiments i info vis - spr 2001
Experimental Environment Sample DQI with range sliders Starfield display, preview bar, range sliders Variable sizes for display, point size and sliders SUN SPARC Station 5 32MB ram UNIX os Motif and C Batch Processing Measurements Setup, Selection, Querying, File read, Data structure setup, Sub-selection, Sub-setup Repeated without graphics for ‘pure’ query time Varied the following: Total attributes, total records, starfield size, point sizes on starfield, range slider sizes and jump sizes Worst Case Analysis Experiments ii info vis - spr 2001
Experiments and Results 7200 runs 3600 with starfield display and preview bar disabled 3600 with starfield display and preview bar 1st 3600: Pure query no more than 20 milliseconds (average 10 milliseconds) faster Time to update internal data structures negligible compared to starfield update times Attributes (a) 2,4,6,8 or 10 Starfield Size (f) 4002, 5002 or 6002 pixels Point Size (d) 12, 32, 52 or 72 pixels Range Slider Size (p) 150, 200, 250 pixels Dataset Size (r) 10,000, 25,000, 50,000, 75,000 or 100,000 records Jump Size/range slider size (j/p) 1/50, 1/25, 1/10, or 1/5 Experiments iii info vis - spr 2001
Experimental Complexity Complexity analysis based on 2nd 3600 queries with starfield display and preview bar enabled Ideally, experimental results will equate with theoretical prediction Predicted run-time to hopefully equal experimental run-time Multiple linear regression used to generate best fit line from data Error compensation Experimental error Algorithmic error Setup Selection Querying Experiments iv info vis - spr 2001
Experiments v info vis - spr 2001
Evaluation X2 test to assess correlation between experiments and theoretical terms Actual values for Setup, Selection, Query highly correlated with estimated values Predictive test to see if future outcomes can be estimated based on past experiment Setup deviation 9.5% Selection deviation: 3.97% Querying deviation: 16.63% Discussion Incremental approach achieved better querying and display times than previous implementations using standard data structures Consumed less memory Data structures created when needed Preview bar, histogram, tight coupling info without making additional queries or spending additional processor time Memory is secondary problem compared to starfield updates which is significant for large r Experiments vi info vis - spr 2001
Conclusions The new incremental approach for queries and display updates introduces a better way of dealing with large databases. Experiments show this approach is faster than previous approaches and can deal with an order of magnitude of larger datasets. The querying time is dominated by the starfield update time. The incremental approach enables faster display because only the difference between consecutive queries is updated in the data structures and on the starfield display. Future Work Make another order of magnitude increase in size of datasets that DQIs can handle Implement other widgets Try spatial data structures like K-D tree Combine DQIs with query previewer to produce a new state of the art in interactive dynamic database access Conclusions & Future Work info vis - spr 2001
Data Visualization Sliders AT&T Bell Laboratories Stephen G. Eick
Abstract • Computer sliders are a generic user input mechanism for specifying a numeric value from a range. For data visualization, the effectiveness of sliders may be increased by using the space inside the slider as • An interactive color scale, • A barplot for discrete data, and • A density plot for continuous data. • The idea is to show the selected values in relation to the data and its distribution. Furthermore, the selection mechanism may be generalized using a painting metaphor to specify arbitrarily, disconnected intervals while maintaining an intuitive user-interface. info vis - spr 2001
Outline Introduction Data Visualization Sliders Summary Keywords High Interaction Thresholding Information Visualization Selection Dynamic Graphics Outline & Keywords info vis - spr 2001
Sliders are a general-purpose user input mechanism to specify an input from a well-defined range Sliders are easy to use, intuitive, and provide a sensitive mechanism for specifying values. Sliders have a threshold bar positioned within a scale that the user manipulates with a mouse to select a value. There are many slider or slider-like applications Common idea is filtering. The pruning of visual clutter from data-rich displays by adjusting sliders is particularly effective in information visualization, and even more so when done dynamically. Introduction info vis - spr 2001
Eick improves sliders - use of internal slider space For color scale For data values ‘tick marks’ Distribution of data Density plot or bar length ‘painting metaphor’ Pop-up menu options Toggle slider views Color rescale for increased color fidelity Range zooming for increased scale sensitivity Animation Labeling to print statistic values Interactive partition adjustment Data Visualization Sliders info vis - spr 2001
Data Visualization Sliders • A: combines visualization color scale and slider with interaction techniques • B: extends A by showing smooth distribution • C: maps info to one color coded bar • D: generalizes C to encode info in bar’s length info vis - spr 2001
Generalizes the generic functionality of sliders along several orthogonal directions User can specify disconnected intervals while preserving intuitive slider interface Internal space as a color scale Interactively rebinding colors to active bars Adjusting color divisions Presenting distribution of data Showing individual data values Move between representations under user control Linking sliders to data they control suggests many natural and obvious extensions Summary info vis - spr 2001
the end info vis - spr 2001