290 likes | 310 Views
Explore the importance and techniques of data visualisation, including common forms such as bar charts, line graphs, pie charts, and scatterplots. Learn about Florence Nightingale's contributions and deceptive visualisation practices with practical examples.
E N D
SSPC9C6 University of Stirling Spring 2016 Graphs & Charts:The Art of Data Visualisation Alasdair Rutherford
Introduction • What is data visualisation, and why do we need it? • Graphs and Charts: Some common forms of presenting data • Developments in Data Visualisation • Choosing the right visualisation • “The Chart Never Lies”: How visualisation can mislead
Presenting Data • Data visualisation is not just about pretty pictures; it is about selecting the best way to present complex data. • It can be used to summarise data, to identify potential patterns, to present your data, or to tell a story.
Florence Nightingale & Data • Data visualisation has a long history – one of the pioneers was Florence Nightingale. • Watch: • http://www.youtube.com/watch?v=yhX0OR1_Vfc
The right chart at the right time • Commonly used graphs and charts: • Bar charts & histograms • Line graphs • Pie charts • Scatterplots • Generated using: • Spread sheet, such as Microsoft Excel • Statistical software, such as SPSS or STATA • Online tools
Bar Charts • Bar charts can be used to present categorical data. • Categories are shown along the x-axis, while data such as averages, percentages or frequencies are represented on the y-axis. • Best when you want to show proportional as well as absolute differences.
Example: Bar Chart of Twitter Activity Days of week categories on the x-axis Percentage of activity on the y-axis SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Example: Bar Chart showing percentages Height of the bars shows percentage Age categories on the x-axis Categories are ordered SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Line Graphs • Line graphs are often used to represent continuous data. • They can also be used for ordered categorical data that approximates a continuous variable. • Best when you want to show changes across the values of the X variable.
Example: Line Graph The shape of the line gives a sense of the trend Continuous data on the x-axis SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Pie Charts • Pie charts are an accessible way to represent percentages and proportions. • They require categorical data, which may or may not be ordered. • They also require that categories are mutually exclusive i.e. the percentages sum to 100%.
Example: Pie Chart of User Ages The area of the segment represents the percentage Easier to see group combinations e.g. two thirds under 25 SOURCE: Sysomos (2009) http://www.sysomos.com/insidetwitter/appendix/
Scatterplots • Scatterplots allow the comparison of two metric (interval or ratio) variables. • They are particularly useful for identifying patterns of association ('correlations') between the two variables, and for spotting 'outliers' (cases with unusually high or low values) within data sets.
Example: Scatter plot of Spouse Age Each dot represents one observation e.g. couple Outliers can be identified HUSBAND The correlation seems quite clear SOURCE: BHPS (2008) WIFE
Box (and whisker) plots • A box plot is a graphical display, based on quartiles, which help us to picture the range of values in a variable (the distribution of scores). • You can use them to explore the distribution of one continuous variable or alternatively you can ask for scores to be broken down for different groups (e.g. age groups).
Example: Box Plot Contains a lot of information in one graph Helps to understand the centre, range and shape of data The statistics here will be introduced later in this course
Example: Box plot by Country Variable of interest Here the data for the USA has a much bigger spread than the data for France Here the data for the USA has a much bigger spread than the data for France Plots by category e.g. country
Developments in visualisation • Improvement in technology and design, alongside developments in data accessibility, have lead to the emergence of a number of tools to make presenting data more accessible. • This combines statistical analysis with good design principles, using data to tell a story.
Example: Area plot of CO2 Emissions Numerical data also included Areas are easily compared Alternative to a bar chart Reference point SOURCE: http://www.informationisbeautiful.net/
Example: Population Proportions Each ‘person’ represents 1% Also conveys information on gender balance Alternative to a pie chart SOURCE: http://www.informationisbeautiful.net/
How do you choose a visualisation? • What sort of data are you using? • What characteristics of the data do you need to communicate? • Who is the audience?
Lies, Damn Lies and Data Visualisation • Like the numbers themselves, data visualisation can be used to mislead, either accidentally or intentionally. • You need to be particularly careful when using percentages, and also pay attention to scales.
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Difficult to Read Line Graph SOURCE: Wainer (1984) “How to Display Data Badly” The American Statistician, Vol. 38, No. 2
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Misleading Bar Chart SOURCE: http://flowingdata.com/category/statistics/mistaken-data/
Top 3 tips for BAD data visualisation • i.e. what NOT to do! • Cram everything you can into the chart – readability is overrated • Choose the scale to hide the inconvenient truth • Emphasise the trivial and ignore the important
Example: Concealing the data SOURCE: Wainer (1984) “How to Display Data Badly” The American Statistician, Vol. 38, No. 2
Summary • There are a wide range of data visualisations available • Selecting the right graph or chart can help you describe your data • Care must be taken in choosing the appropriate graph or chart depending on your data and audience • Beware of misleading with your visualisations, either intentionally or accidentally