210 likes | 864 Views
Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds Lecture Outline Preliminaries Definitions Datatypes Simple Data Presentation Graphs and charts Basic Datatypes correspond to different levels of measurement Data can be:
E N D
Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds ENV 2006
Lecture Outline • Preliminaries • Definitions • Datatypes • Simple Data Presentation • Graphs and charts ENV 2006
Basic Datatypes correspond to different levels of measurement Data can be: Categorical - labels Numerical – numbers Categorical Nominal No sense of order Apples, oranges,… Ordinal Ordered in sequence January, February, .. Numerical Continuous Real numbers Height of students in class Discrete Typically whole numbers Marks in an exam Fundamentals ENV 2006
Give an example for each class in which numbers are involved… Categorical - nominal Categorical - ordinal Numerical – continuous Numerical - discrete Question ENV 2006
Pioneering figure is John Tukey New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis See wikipedia Two stage process: Exploratory: Search for evidence using all tools available Confirmatory: evaluate strength of evidence using classical data analysis Exploratory Data Analysis ENV 2006
Simple Data Presentation ENV 2006
Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms… Which should we use and when? Simple Data Presentation ENV 2006
Fundamental technique of data presentation Used to compare two variables X-axis is often the control variable Y-axis is the response variable Good at: Showing specific values Trends Trends in groups (using multiple line graphs) Mobile Phone use Line Graph Students participating in sporting activities Any critical comments here? Note: graph labelling is fundamental ENV 2006
Bar graph Presents categorical variables Height of bar indicates value Double bar graph allows comparison Note spacing between bars Can be horizontal (when would you use this?) Simple Representations – Bar Graph Number of police officers Internet use at a school Note more space for labels ENV 2006
Very simple but effective… Horizontal to give more space for labelling Dot Graph ENV 2006
Pie chart summarises a set of categorical/nominal data But use with care… … too many segments are harder to compare than in a bar chart Pie Chart Should we have a long lecture? Favourite movie genres ENV 2006
Histograms summarise discrete or continuous data that are measured on an interval scale No gaps if variable is continuous Histograms Distribution of salaries in a company ENV 2006
Used to present measurements of two variables Effective if a relationship exists between the two variables Example taken from NIST Handbook – Evidence of strong positive correlation Scatter Plot Car ownership by household income ENV 2006
The scatter plot is a fundamental tool in Excel Chart type XY (Scatter) and subtype Unconnected Points Scatter Plots in Excel http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html ENV 2006
Excel allows you to add a linear regression line (trend line) Regression Line Remember: correlation does not imply causality… ie a relationship exists but one is not necessarily causing the other – there may be a third factor? ENV 2006
Tukey Sum-Difference Plot Better understanding of residuals … ENV 2006
In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution When might this occur? Tukey proposed the idea of a boxplot to visualize the distribution of values For explanation and some history, see: http://mathworld.wolfram.com/Box-and-WhiskerPlot.html http://en.wikipedia.org/wiki/Box_plot Darwin’s plant study http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html Box Plots M – median Q1, Q3 – quarrtiles Whiskers – 1.5 * interquartile range Dots - outliers ENV 2006
Acknowledgement • Thanks to Statistics Canada – an excellent web site for simple data presentation • http://www.statcan.ca/english/edu/power/toc/contents.htm ENV 2006
Exercise for next week • Understand a bit more about the merits of pie charts and bar graphs • Create a dataset with roughly equal numbers in each class • Which is best if the task is to discriminate? ENV 2006
Exercise for next week • Over the next week look for examples of basic graphs • In newspapers, magazines or other print media • On news web sites or other electronic media • Analyse two examples • One should be a example where you think the use of graphics is good • One should be bad • Be ready next week to present these results to the class… ENV 2006
Envisioning Information : Practical Work Gnuplot R Excel ENV 2006