290 likes | 308 Views
SSPC9C6 University of Stirling Spring 2016. Graphs & Charts: The Art of Data Visualisation. Alasdair Rutherford. Introduction. What is data visualisation, and why do we need it? Graphs and Charts: Some common forms of presenting data Developments in Data Visualisation
E N D
SSPC9C6 University of Stirling Spring 2016 Graphs & Charts:The Art of Data Visualisation Alasdair Rutherford
Introduction • What is data visualisation, and why do we need it? • Graphs and Charts: Some common forms of presenting data • Developments in Data Visualisation • Choosing the right visualisation • “The Chart Never Lies”: How visualisation can mislead
Presenting Data • Data visualisation is not just about pretty pictures; it is about selecting the best way to present complex data. • It can be used to summarise data, to identify potential patterns, to present your data, or to tell a story.
Florence Nightingale & Data • Data visualisation has a long history – one of the pioneers was Florence Nightingale. • Watch: • http://www.youtube.com/watch?v=yhX0OR1_Vfc
The right chart at the right time • Commonly used graphs and charts: • Bar charts & histograms • Line graphs • Pie charts • Scatterplots • Generated using: • Spread sheet, such as Microsoft Excel • Statistical software, such as SPSS or STATA • Online tools
Bar Charts • Bar charts can be used to present categorical data. • Categories are shown along the x-axis, while data such as averages, percentages or frequencies are represented on the y-axis. • Best when you want to show proportional as well as absolute differences.
Example: Bar Chart of Twitter Activity Days of week categories on the x-axis Percentage of activity on the y-axis SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Example: Bar Chart showing percentages Height of the bars shows percentage Age categories on the x-axis Categories are ordered SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Line Graphs • Line graphs are often used to represent continuous data. • They can also be used for ordered categorical data that approximates a continuous variable. • Best when you want to show changes across the values of the X variable.
Example: Line Graph The shape of the line gives a sense of the trend Continuous data on the x-axis SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Pie Charts • Pie charts are an accessible way to represent percentages and proportions. • They require categorical data, which may or may not be ordered. • They also require that categories are mutually exclusive i.e. the percentages sum to 100%.
Example: Pie Chart of User Ages The area of the segment represents the percentage Easier to see group combinations e.g. two thirds under 25 SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Scatterplots • Scatterplots allow the comparison of two metric (interval or ratio) variables. • They are particularly useful for identifying patterns of association ('correlations') between the two variables, and for spotting 'outliers' (cases with unusually high or low values) within data sets.
Example: Scatter plot of Spouse Age Each dot represents one observation e.g. couple Outliers can be identified HUSBAND The correlation seems quite clear SOURCE: BHPS (2008) WIFE
Box (and whisker) plots • A box plot is a graphical display, based on quartiles, which help us to picture the range of values in a variable (the distribution of scores). • You can use them to explore the distribution of one continuous variable or alternatively you can ask for scores to be broken down for different groups (e.g. age groups).
Example: Box Plot Contains a lot of information in one graph Helps to understand the centre, range and shape of data The statistics here will be introduced later in this course
Example: Box plot by Country Variable of interest Here the data for the USA has a much bigger spread than the data for France Here the data for the USA has a much bigger spread than the data for France Plots by category e.g. country
Developments in visualisation • Improvement in technology and design, alongside developments in data accessibility, have lead to the emergence of a number of tools to make presenting data more accessible. • This combines statistical analysis with good design principles, using data to tell a story.
Example: Area plot of CO2 Emissions Numerical data also included Areas are easily compared Alternative to a bar chart Reference point SOURCE: http://www.informationisbeautiful.net/
Example: Population Proportions Each ‘person’ represents 1% Also conveys information on gender balance Alternative to a pie chart SOURCE: http://www.informationisbeautiful.net/
How do you choose a visualisation? • What sort of data are you using? • What characteristics of the data do you need to communicate? • Who is the audience?
Lies, Damn Lies and Data Visualisation • Like the numbers themselves, data visualisation can be used to mislead, either accidentally or intentionally. • You need to be particularly careful when using percentages, and also pay attention to scales.
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Difficult to Read Line Graph SOURCE: Wainer (1984) “How to Display Data Badly” The American Statistician, Vol. 38, No. 2
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Misleading Bar Chart SOURCE: http://flowingdata.com/category/statistics/mistaken-data/
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Concealing the data SOURCE: Wainer (1984) “How to Display Data Badly” The American Statistician, Vol. 38, No. 2
Summary • There are a wide range of data visualisations available • Selecting the right graph or chart can help you describe your data • Care must be taken in choosing the appropriate graph or chart depending on your data and audience • Beware of misleading with your visualisations, either intentionally or accidentally