630 likes | 645 Views
Explore and analyze large hierarchical relational databases easily using Polaris. Visualize data cubes, hierarchies, and data abstractions. Supports efficient data mining and statistical analysis.
E N D
PolarisQuery, Analysis, and Visualization of Large Hierarchical Relational Databases Chris Stolte Computer Science Department Stanford University
Motivation • Large relational databases have become very common • Corporate data warehouses • Amazon, Walmart,… • Scientific projects: • Human Genome Project • Sloan Digital Sky Survey • Need tools to extract meaning from these databases • Programmatic data mining/statistical analysis • Visual exploration and analysis
Related Work • Formalisms for graphics • Bertin’s “Semiology of Graphics” • Mackinlay’s APT • Roth et al.’s Sage and SageBrush • Wilkinson’s “Grammar of Graphics” • Visual exploration of databases • DeVise • DataSplash/Tioga-2 • Visualization and data mining • SGI’s MineSet • IBM’s Diamond
Outline • Review of Data Warehouses & Data Cubes • A Visualization Formalism • Polaris: Visual Data Mining • Multiscale Visualization
Review: Data Warehouses Data warehouse stores data for analysis • measures (facts) categorized by dimensions: Fact table State Month Product Name Profit Sales Payroll Marketing Inventory Margin ... Nominal / Ordinal fields (categorical dimensions) Coffee chain (courtesy Visual Insights) Quantitative fields (measures)
Hierarchies Data warehouses are very large—need to summarize • Add hierarchical structure to warehouse Dimension tables Time Year Quarter Month Fact table Location Market State State Month Product Name Profit Sales Payroll Marketing Inventory Margin ... Products Product Type Product Name
Each dimension table describes a tree • Each level in tree describes a level-of-detail • Meaningful basis for aggregation • Derived from semantics or programmatic analysis • Context for data Hierarchical Dimensions Time Year Quarter Month
Data Cube For each level-of-detail, summarize relations as cubes • More efficient, powerful model for analysis Each cell aggregatesall measures for those dimensions Each cube axis corresponds to a dimension in the relation at a level-of-detail
Hierarchies & Data Cubes Hierarchies define a lattice of cubes: Least detailed Each cube is defined by a level-of-detail in each dimension. Data abstraction Most detailed
Projecting Data Cubes Can further abstract a cube by “projection”: Data abstraction
Data Warehouse Summary • Industry standard for storing analytic data • Not operational or transactional data • Structured as a lattice of data cubes (aggregations) • Provide summaries of data at meaningful levels of detail • To perform data abstraction: • Choose a cube in the lattice of cubes • Project to relevant dimensions • Where a lot of important data is stored
A Visualization Formalism • Typical approach: • Monolithic objects defining a single “visual metaphor” • Formalism: • Defines a space of visualization and unifies tables, different graphs as a class of visual representation • Succinct specification of sophisticated visualizations • Can be compiled into necessary drawing operations and database queries • Exposes structure and pattern of effective visual metaphors • Powerful tool for describing, comparing, and building visualizations
Polaris Formalism • Visualizations described using visual specifications that define: • Table configuration for visualization (algebra) • Type of graphic in each pane • Encoding of data as visual properties (color, size, shape, …) of marks • Data transformations and queries • Interpreter compiles a specification into drawing commands and database queries
Polaris Algebra: Operands • Ordinal fields: interpret domain as a set that partitions table into rows and columns: • Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} • Quantitative fields: treat domain as single element set and encode spatially as axes: • Profit = {(Profit)}
Concatenation (+) Operator • Ordered union of two sets • Quarter + ProductType • = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)} • = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} • Profit + Sales • = {(Profit),(Sales)}
Cross () Operator • Direct-product of two sets • Quarter ProductType = • {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), • (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)} • ProductType Profit =
Categorical Hierarchies • Quarter Month • Direct product of two sets • Would create twelve entries for each quarter, i.e. (Qtr1, December) • Quarter / Month • Based on tuples in fact table not semantics • Would only create three entries per quarter • Can be expensive to compute • Quarter . Month • Based on tuples in dimension tables
Encoding System <color encoding><measure name=‘Profit’></color encoding>
SQL Dataflow • Notes • Aggregation operators applied after sort • Only one layer is shown; additional z-sort Sort Query Results Tuples in Panes Marks in Panes
The Pivot Table Interface • Common interface to statistical packages/Excel • Cross-tabulations • Simple interface based on drag-and-drop
Extending the Pivot Table Interface • Extend the interface by • Generating rich table-based graphical displays rather than tables of text • Providing a single conceptual model for both graphs and tables • Preserving the ability to rapidly construct displays
Polaris Design Goals Design guided by two primary goals: • Interactive analysis and exploration versus static visualization • Simple, consistent interface
Analysis & Exploration Challenges • Designing a user interface for analysis and exploration places several requirements on the user interface: • Data dense displays:display both many tuples & many dimensions • Multiple display types:different displays suited to different tasks • Exploratory interfaces:rapidly change data transformations and views
Simple, Consistent Interface • Excel Pivot tables provide a simple interface for building text-based tables • Graphs require multiple steps: different interfaces and conceptual models • Want to unify tables, graphs, and database queries in one interface
Data Mining and Visualization • Polaris not solely for visual analysis • Precursor to algorithmic analysis • Validate results and establish trust • Incorporate decision trees and classification algorithms into data warehouses as hierarchies
Multiscale Visualization • Directly support analysis process • “Overview first, zoom and filter, then details-on-demand” • Visual representation changes as user pans and zooms • Overview, lots of data data highly abstracted • Zoom, data density decreases more detailed information shown • Visual and data abstraction • Visual abstraction different representation/same data • Data abstractiontransformations to reduce data set size
Existing Multiscale Visualizations • Cartography • Multiscale information visualization • Pad++: alternate desktops • DataSplash • XmdvTool • ADVIZOR • Main limitations: • One zoom path • Primarily visual abstraction
Contributions • Multiscale visualization with both visual and data abstraction using generalized mechanisms: • Data Abstraction Data Cubes • Visual Abstraction Polaris • “Design Patterns”
Path of Exploration • Can think of an analysis as path of specifications
Path of Exploration Visual abstraction
Path of Exploration This is a multiscale visualization! Dataabstraction
Graphical Notation: Templates Instance Template
Specifying Multiscale Visualizations • Specify multiscale visualization using a graph of Polaris specifications • “zoom graphs” • Infovis 2002 paper* describes how to implement Polaris Specification Zooming Possible zoom
Specifying Multiscale Visualizations • Can specify a zooming pattern by using templates
Specifying Multiscale Visualizations • Independent zooming on different dimensions is described as a graph y-axis zoom x-axis zoom
Design Patterns • Zoom graphs simplify specifying and implementing multiscale visualizations • Design is still very hard • “Design patterns” (a la Gamma et al.) • Capture zoom structures that have been used effectively & reuse in new designs • We present four such patterns • Formal way to discuss multiscale visualization