260 likes | 271 Views
Polaris is a system designed for exploring and analyzing large multi-dimensional databases like corporate data warehouses and scientific projects. It extends traditional pivot tables to generate rich graphical displays for effective data exploration, maintaining a simple and consistent interface for interactive analysis.
E N D
Polaris: A System for Query, Analysis, & Visualization of Relational Databases Chris Stolte May 29th, 2002
Motivation • Large multi-dimensional databases have become very common • corporate data warehouses • Amazon, Walmart,… • scientific projects: • Human Genome Project • Sloan Digital Sky Survey • Need effective tools for exploration and analysis of these databases
Existing Tools: Charts • typically provide a “gallery” of charts • hard to iteratively explore • simple charts can display few dimensions
Existing Tools: Pivot Tables • common interface to data warehouses • simple interface based on drag-and-drop • generate text tables from databases:
Polaris: Extending Pivot Tables • generate rich table-based graphical displays rather than tables of text • single conceptual model for both graphs and tables • preserve ability to rapidly construct displays
Polaris Design Goals Two main design goals: • Interactive analysis and exploration versus static visualization • Simple, consistent interface
Design Goal: Analysis & Exploration • Want to extract meaning from data • Process of hypothesis, experiment, and discovery • Path of exploration is unpredictable
UI Requirements for Exploration • Data dense displays: display both many tuples & many dimensions • Multiple display types:different displays suited to different tasks • Exploratory interfaces:rapidly change data transformations and views
Design Goal: Simple, Consistent UI • Excel Pivot tables provide a simple interface for building text-based tables • Graphs require multiple steps: different interfaces and conceptual models • Want to unify tables, graphs, and database queries in one interface
Display Types Gantt charts of events for a parallel graphics application on a 32-processor SGI machine. Flights between major airports in the USA Source code colored by cache misses for a parallel graphics application. Major wars and the births of well known scientists as a timeline.
Polaris Formalism • UI interpreted as visual specification (in XML) that defines: • table configuration • type of graphic in each pane • encoding of data as visual properties of marks • data transformations • Specification then compiled into queries & drawing commands to generate visualization
Design Decision: Use a Formalism • Why a formalism? • unification: unify tables and graphs • expressiveness: build visualizations designers did not think of • interface simplicity: clearly defined semantics and operations • code simplicity: composable language versus monolithic objects • declarative: can state what, not how—allows for optimization, etc.
} table configuration Example specification
Specifying Table Configurations • Interface: define table configuration by dropping fields on shelves • Formalism: shelf content interpreted as expressions in table algebra • Can express extremely wide range of table configurations
Specifying Table Configurations • Operands are the database fields • each operand interpreted as a set {…} • quantitative and ordinal fields interpreted differently • Four operators: • concatenation (+), cross (X), nest (/), dot (.)
Table Algebra: Operands • Ordinal fields: interpret domain as a set that partitions table into rows and columns: Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} • Quantitative fields: treat domain as single element set and encode spatially as axes: Profit = {(Profit[-410,650])}
Quarter + ProductType = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} + {(Coffee), (Espresso)} = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} Concatenation (+) operator • Ordered union of set interpretations: Profit + Sales = {(Profit[-310,620]),(Sales[0,1000])}
Cross (x) operator • Cross-product of set interpretations: Quarter x ProductType = {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)} ProductType x Profit =
Nest (/) operator • Quarter x Month • would create entry twelve entries for each quarter. i.e., (Qtr1, December) • Quarter / Month • would only create three entries per quarter • based on tuples in database not semantics • can be expensive to compute
Dot (.) operator: Hierarchies • Many data warehouses have hierarchical dimensions: • Time:Year, Month, Day • Location:Country, State, Region • Dot (.) works like Nest (/) except it exploits the defined hierarchies • based on semantics not tuples in database • Demo
Formalism • Can mix graph types in single visualization:
Polaris Formalism • Remainder of formalism defined in papers*: • specification of different graph types • encoding of data as retinal properties of marks in graphs • data transformations • translation of visual specification into SQL queries * Relevant papers: Query, Analysis, and Visualization of Hierarchically Structured Data using PolarisChris Stolte, Diane Tang and Pat HanrahanProceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (extended paper)Chris Stolte, Diane Tang and Pat HanrahanIEEE Transactions on Visualization and Computer Graphics, Vol. 8, No. 1, January 2002.
Generating Queries • Database queries automatically generated from specification. • Multiple queries required if level-of-detail varies. • Algebraic manipulation can be used to determine minimal set of queries. • Current interpreters can generate SQL, MDX, or Rivet queries.
Related Visualization Projects • Formalisms for Graphics • Wilkinson’s Grammar of Graphics • Bertin’s Semiology of Graphics • Mackinlay’s APT • Visual Exploration of Databases • DeVise, Visage, DataSplash/Tioga-2 • Table-based Visualizations • Table Lens, Spreadsheet for Visualization
Summary • Exploratory visualization versus presentation • Multiple display types—different questions require different visualizations • Polaris: a novel interface for rapidly constructing table-basedgraphical displays from multi-dimensional relational databases • Formalisms: powerful declarative tool for specifying complex graphics and tables