Data Visualisation

HCI 0283 Lecture 3 Quantitative Data Data Visualisation

Basic Task • When dealing with quantitative data we are trying to select one item from many based upon the numerical values of their attributes • For instance, when buying a car we may consider price, fuel consumption, engine size, number of passengers… • We need to be able to present this data and allow it to be rearranged in a manner that makes the decision task easier

Dimensionality • The complexity of a display is generally dependent upon the number of attributes (variables) involved • This is the dimensionality of the problem • If we have only one attribute (such as price) then we are dealing with univariate data and the problem is relatively straightforward • If we add a second attribute (such as fuel consumption) then we are dealing with bivariate data and the problem is a little more difficult • Three attributes – trivariate data – leads to even more difficulty in display, and anything more than that – hypervariate data – is a very difficult task indeed • We will consider, in turn, univariate, bivariate, trivariate and hypervariate data and how we can display it

Univariate Data • Suppose we have a list of cars that are characterised by price • This is an example of a collection of univariate data • We could present this data as a table or, more effectively, as a plot of points against some scale • Which we choose may depend on how much space is available

Simple Plot • The simple plot makes it easy to see • The lowest and highest values • The general distribution of points • Any bunching of points • It is not easy to add labels • It is not easy to make any judgements based on aggregation, e.g. the mean price 40 50 60 0 10 20 30 Price (£K)

Tukey Plots • The easiest was to introduce aggregation is to use a Tukey Box Plot • This shows • The 25th, 50th and 75th percentiles as a box • The 10th and 90th percentiles as lines • Anything beyond these as ‘outliers’ 40 50 60 0 10 20 30 Price (£K)

Simple Plots • It is very useful to be able to zoom in to examine particular ranges in detail BUT • Zooming should reveal more detail, not larger dots! 30 40 Volvo Volvo Rover Rover Merc BMW BMW Saab 32 34 36 38 30 40

Bivariate Data • Bivariate data is often displayed using a two dimensional plot of one variable against the other • Such scatterplots allow us to easily identify • Global trends • Trade-offs • Outliers 6 5 4 3 Number of Bedrooms 2 1 50K 100K 150K 200K 250K 300K Price (£)

Bivariate Data • If we can control one variable and therefore group the data then we can create a display of multiple boxplots • This makes it easy to make comparisons of aggregate measures between groups • This is beginning to look a bit like a histogram…

Multiple Boxplot

Histograms • Histograms can also be used to show categorised bivariate data • Simple histograms are of limited use, but we can extend their usefulness by making them interactive • With two histograms we can show how the data in one histogram is related to the data in the other

Attribute A Attribute B

Coffee Time!

Trivariate Data • We live in a three-dimensional world, so surely a three-dimensional data display is natural, right? • Yes, if you can build a physical model, do it in virtual reality or using holographic displays  • Most people have to make do with two-dimensional monitors or sheets of paper

Trivariate Data • Can we decide if A has a greater value of Price than B? • No • We could aid this by projecting each point onto each axis Price D C Bedrooms B A Travel time

Trivariate Data Bedrooms Price Travel Time D A A C D D B B A C C B Travel Time Bedrooms Price

Scatterplot Matrix D D Price C C B B A A Travel Time A Travel Time D B C Bedrooms

Trivariate Data • The scatterplot matrix contains as many scatterplots as there are pairs of parameters • For more than five parameters this becomes unworkable • If we have N objects with M parameters we end up with NxM points in total • Labelling this many points is a problem

Brushing • The scatterplot matrix is still a very useful tool if we use the brushing technique to highlight points of interest • If we select (or brush) a subset of points on one plot then the corresponding points on other plots are also highlighted • Brushing is particularly useful when dealing with hypervariate data and can be implemented in many ways

Brushing Price Travel Time Travel Time Bedrooms

Trivariate Data • It is also difficult to interpret surfaces • Simple questions such as ‘what is the minimum value of Z?’ are difficult to answer • There are two main approaches to this • Flooding, i.e. slicing through the surface at the desired value • Rotating the surface

Trivariate Data • Rotate it…

Hypervariate Data • Many real-world situations require us to display more than three variables • One solution is to use parallel coordinate plots • These take all of the axes of a multidimensional space and arrange them parallel to each other • Each data point appears once on each axis

Parallel Coordinate Plot Price Travel Time Price Travel Time Price D C B A Travel Time

Multivariate Data • This is easy to extend to any desired number of dimensions with each dimension being treated equally

Hypervariate Data • A variation on this approach is to use a starplot • In this case the axes radiate from a common origin • This is similar to Florence Nightingale’s original ‘batwing’ plots • Multiple objects can be compared on the basis of their shapes

Starplot A G B F C E D

Hypervariate Data • Mosaic plots can also be used to represent and rearrange hypervariate data • If we add gender to the eye and hair colour data in the mosaic plot example we used last week then we can extend the mosaic plot to show this

Male / Female

Hypervariate Data • The extension to four dimensions is also straightforward • In April 1912 the cruise liner Titanic sank killing 1731 of the 2201 people on board • The raw data on these deaths contains four variables – Gender, Survival, Class and Adult/Child

Titanic Raw Data

Hypervariate Data • The Scatterbox principle can also be extended to large numbers of dimensions • The resulting structure is a hyperbox • This looks like a 3D structure constructed so that all possible pairs of variables are shown plotted against each other • This is best used interactively so that the face of interest can be rotated to the front

Hyperbox • Each pair of numbers represents a pair of dimensions • Each face is a bivariate display • Rotating and deforming the object allows each face to be easily viewed and interpreted 12 13 23 14 34 24 45 15 35 25

Summary • Univariate • 1 dimension, Tukey Box Plots • Bivariate • 2 dimensions, scatterplots, histograms • Trivariate • 3 dimensions, scatterplot matrix, brushing • Hypervariate • >3 dimensions, parallel coordinate plots, mosaic diagrams, starplots, hyperboxes

Coming Soon… • Next lecture: Representation • Homework: Read chapter 3 of Information Visualisation (Spence) and the two papers handed out in the lecture

Data Visualisation