340 likes | 508 Views
[ Visual Analytics ]. New Directions in Analysis and Visualization. Dr Jeremy Walton NAG Ltd, Oxford jeremy.walton@nag.co.uk. Overview. Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project
E N D
[Visual Analytics] New Directions in Analysis and Visualization Dr Jeremy Walton NAG Ltd, Oxford jeremy.walton@nag.co.uk Research Methods Festival, St Catherine's College, Oxford
Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford
Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford
NAG profile • Products • Mathematical, statistical, data analysis components • 3D visualization, compilers & tools • HPC software engineering services • HECToR support • Users • Academic researchers • Professional developers • Analysts / modelers • Founded 1976 • Not-for-profit company Research Methods Festival, St Catherine's College, Oxford
High-End Computing Terascale Resource • Latest high-end computing service for UK • funded by EPSRC, NERC & BBSRC • will run from 2007-2013 • Partners: • Hardware: Cray Inc • Service Provision: University of Edinburgh HPCx Ltd • hardware hosting, user services, help desk • CSE Support: NAG Ltd • technical assessment of project application • porting / tuning / optimisation of user codes • training courses (inc. visualization) • best practice guides, documentation, FAQs Research Methods Festival, St Catherine's College, Oxford
Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford
Visualization toolkits • Help construct visualization applications • no wheel-reinvention, stone canoes, chocolate teapots • Proprietary supported commercial systems • e.g. Excel, IRIS Explorer, Spotfire • Open source, freely available software • e.g. OpenDX, InfoVis Research Methods Festival, St Catherine's College, Oxford
NAG’s IRIS Explorer… • General purpose toolkit for data visualization • Reusable building blocks (modules) • Connect modules to build application • Point-and-click development • Visual programming approach • Build, execute, reshape • Add new modules, if required Research Methods Festival, St Catherine's College, Oxford
Application in map editor Modules in module librarian Reads data Colormaps it Makes ribbon Displays it …in action Research Methods Festival, St Catherine's College, Oxford
Make the connections Research Methods Festival, St Catherine's College, Oxford
Adds axes Add more modules... Research Methods Festival, St Catherine's College, Oxford
Addscaption ...and even more Research Methods Festival, St Catherine's College, Oxford
Some examples Research Methods Festival, St Catherine's College, Oxford
Trendalyzer (Gapminder) Research Methods Festival, St Catherine's College, Oxford
Worldmapper: area Research Methods Festival, St Catherine's College, Oxford
Worldmapper: deaths by disease Research Methods Festival, St Catherine's College, Oxford
Many eyes: shared visualization Research Methods Festival, St Catherine's College, Oxford
Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford
NAG Data Mining Tools • Data Cleaning • Data imputation - adding missing values • Outlier detection - finding suspect data records • Data Transformation • Scaling Data - before distance computation • Principal Component Analysis - reducing # of variables • Model fitting • Cluster analysis - finding interesting groups • Classification techniques - # of groups is known • Regression no groups - outcome is continuous • Linear / Non-linear / Time series Research Methods Festival, St Catherine's College, Oxford
Example: exploratory data analysis • How many species of water vole (Arvicola) in UK? • Measurement data • Presence / absence of 13 skull characteristics • 300 observations, each in one of 14 regions • 3 groups: • A. terrestris / A. sapidus / unclassified UK cases • Treatment • Average data within each region • Gives 14 data points in 13 dimensions • How to display dataset? Research Methods Festival, St Catherine's College, Oxford
Analysis • 2D scatterplots? • Structure is unclear • (13 x 12) / 2 = 78 plots needed • Principal components analysis? • 2 PCs explain 49% of the variance • 3 PCs explain 65% of the variance • Should be > 85% for confident representation • Fisher’s iris dataset (4 variables) is 95% • Alternative technique • Metric scaling Research Methods Festival, St Catherine's College, Oxford
Metric scaling • 14 data points – one for each region • Each point has values for 13 variables • Construct 14 by 14 dissimilarity matrix, Δ • Δij = distance between points i & j in 13D space • Δ is symmetric, with zero diagonal elements • Want to find a new matrix, Δ* • set of 14 new data points in 3D space that preserve Δ • Project Δ to Δ* using metric scaling • Display data points in 3D Research Methods Festival, St Catherine's College, Oxford
Exploratory data analysis conclusions • 2D scatterplots don’t indicate group structure • cf. iris dataset • 3D PCA unreliable here • Metric scaling of Δ used to reduce D from 13 to 3 • 3D visualization reveals group structure • Distinct A. sapidus group • UK sample represents only A. terrestris Research Methods Festival, St Catherine's College, Oxford
Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford
The ADVISE project • DTI-funded research project, started March 2007 • NAG / VSN / University of Leeds • Merge visualization & statistics (visual analytics) • use statistics to identify key characteristics of dataset • understand the characteristics through visualization • User community • pharmaceuticals • environmental science • engineering • Initial user meeting held September 2007 Research Methods Festival, St Catherine's College, Oxford
Large datasets • Size matters (but isn’t everything) • Developer’s view:Too large for our current system • Problems of • performance • robustness • User’s view:Too large for me to understand • Current ADVISE datasets are “only” a few GB • complications (e.g comparing several) could raise this • HECToR users have TB datasets Research Methods Festival, St Catherine's College, Oxford
ADVISE ideas • Retention of visual programming interface • Re-use of algorithmic base • IRIS Explorer modules • GenStat statistics functionality (from VSN) • Three layered architecture • User interface • Web service middleware • Visualization components • Distribution, tailored user interface, collaboration Research Methods Festival, St Catherine's College, Oxford
ADVISE progress • Porting IE modules to standalone environment • some of these use GenStat for statistics • New system used to revisit air quality demo • early (IEEE Viz 96) web-based visualization • new system more efficient • Working with real user data Research Methods Festival, St Catherine's College, Oxford
Conclusions • NAG offers software components for developers • no wheel-reinvention, stone canoes, chocolate teapots • Visualization & data mining crucial for analysis • distribution, steering, classification, exploration • interactivity / interrogation important • integration is an ongoing field of activity • ADVISE project • developing a new system for visual analysis • working with real user problems • improving understanding of data Research Methods Festival, St Catherine's College, Oxford