250 likes | 345 Views
A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration. Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab. & Department of Computer Science University of Maryland, College Park. Hierarchical Clustering Explorer (HCE).
E N D
A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab. & Department of Computer Science University of Maryland, College Park
Hierarchical Clustering Explorer (HCE) “HCE enabled us to find important clusters that we don’t know about yet.”
Goal: Find Interesting Features in Multidimensional Data • Finding correlations, clusters, outliers, gaps, … is difficult in multidimensional data • Cognitive difficulties in >3D • Therefore utilize low-dimensional projections • Perceptual efficiency in 1D and 2D • Use Rank-by-Feature Framework to guide discovery
Outliers He Rn
Demo Demonstration • Breakfast Cereals • 77 cereals • 8 dimensions (or variables) : sugar, potassium, fiber, protein, etc. • US counties census data • 3138 counties • 14 dimensions : population density, poverty level, unemployment, etc.
X3 X1 Low-dimensional Projections -2X1+X2 • Techniques • General • combination of variables for an axis • Axis parallel • a variable for an axis • Number of projections • Interface for Exploration X1+2X2
Exploration by Projections • XGobi, GGobi – Scatterplot Browsing www.research.att.com/areas/stat/xgobi/ www.ggobi.org
Exploration by Projections • Spotfire DecisionSite – Scatterplots www.spotfire.com
Exploration by Projections • XGobi, GGobi – Grand Tour
Exploration by Projections • XmdvTool – Scatterplot Matrix Worcester Polytechnic Institute
Square Matrix Display Dimension selection tool Corrgram by Michael Friendly in GeoVISTA studio by Alan M. MacEachren
Exploration by Projections • Spotfire DecisionSite– View Tip orders scatterplots
Design Considerations • Hard to interpret arbitrary linear projections Axis-parallel projections • Interestingness depends on applications Incorporate users’ interest • Overview of all possible projections • Rapid change of axis
Demo Demonstration • Breakfast Cereals • 77 cereals • 11 dimensions (or variables) : sugar, potassium, fiber, protein, etc. • US counties census data • 3138 counties • 14 dimensions : population density, poverty level, unemployment, etc.
Rank-by-Feature Framework: 1D Ranking Criterion Rank-by-Feature Prism Score List Manual Projection Browser
Rank-by-Feature Framework: 2D Ranking Criterion Rank-by-Feature Prism Score List Manual Projection Browser
A Ranking Example 3138 U.S. counties with 17 attributes Ranking Criterion: Uniformity (entropy) (6.7, 6.1, 4.5, 1.5) Ranking Criterion: Pearson correlation (0.996, 0.31, 0.01, -0.69)
Ongoing and Future Work • Identify & implement more ranking criteria • Gaps, outliers, etc. • Ranking based on users’ selection of items • Separability of the selected items • Ranking by using only the selected items • Scalability Issue • How to handle a large number of dimensions • Grouping by clustering dimensions • Filtering uninteresting entries in the prism
More about HCE • In collaboration and sponsored by Eric Hoffman: Children’s National Medical Center • Freely downloadable at www.cs.umd.edu/hcil/hce • Version 3.0 beta, May 2004 • About 2,000 downloads since April 2002 • Licensing to ViaLactia Biosciences (NZ) Ltd.
More Applications? • Try HCE and the Rank-by-Feature Framework with your problems and data • Join the case studies on the use of HCE and the Rank-by-Feature Framework • Welcome suggestions and comments