250 likes | 471 Views
STAT131 Week 2 Lecture 2 Making Comparisons. Anne Porter. Lecture Outline. Histogram and Distributions Comparing two batches of data Comparing many batches of data Good graphics. Histograms and Distributions. Video Clip, Decisions through Data, Tape 1 Unit 3 Comparing mean and mode
E N D
STAT131Week 2 Lecture 2Making Comparisons Anne Porter
Lecture Outline • Histogram and Distributions • Comparing two batches of data • Comparing many batches of data • Good graphics
Histograms and Distributions • Video Clip, • Decisions through Data, Tape 1 Unit 3 • Comparing mean and mode • Symmetric and asymmetric distributions • Changing the bin width
Comparing data for men and women What might we do?
Dot plot (with error) • Note how easy it is to see the outlier • What do you see?
Dot plot (error removed) Note how easy it is to see • Spread • Centre of data • Absence of outliers • Comparison of male and female shoesize • We cannot see the shape of the distribution or density of dots (ie many people with the one shoe size)
Comparing (two) batches of data • SPSS does not do this • Reverse the stems for one side in a table in word • What does the data reveal?
Comparing (two) batches of data • SPSS does not do this • Reverse the stems for one side in a table in word • What does the data reveal? Shape of male and female distributions Centre (mode) for males (10) and females (8) Spread for males (7-15) and females (4-11.5) No outliers (?) Pattern within distribution Are there male half sizes above 13?
Box-and-Whisker plots(with error)Boxplot plots the five number summary • What does it reveal?
Box-and-Whisker plots(with error) • What does it reveal?
Box-and-Whisker plots(with error) • What does it reveal? • Based on samples of 31 females and 118 males we can see: • Centre (median shoe size) for males (10) is higher than for females (8) • Spread (range and inter- quartile range for males (9-11) and females (7-8.5)is roughly the same. • Outliers: Females have two (size 4 & 11.5) and males two (14 & 15) • Female data is more asymmetric than male with relatively shorter tail of upper values
When comparing batches of data • The plots must be on the same scale to allow comparison • Centre • Distribution (shape) • Spread • Outliers • Patterns • Comparison using two separate plots is not appropriate • Different plots will reveal different aspects of the data
Utility: Box plots vs Stem and Leaf plots • Box plots are especially useful for comparing ≥ 2 samples. They show the key points of a sample, but not the individual values. • Stem and leaf plots show individual values, and give a better picture of the shape of the spread, but their detail makes them unsuitable for comparing more than two groups (side by side or back to back).
Comparing more than two groups • Same axes/scale • Compare • Centre (medians) • Spread (interquartile range, middle 50%) • Outliers/extreme points • Shape (symmetry) • Be detailed • Superlooper median approx180, compared cone dart 590, glider 400 cm) • Include units of measurement
Graphical Excellence • Convey the message about the data • Axes, units, variable names, figure labels DO NOT • Distort the data • Use pie charts (there is always a better chart) • More dimensions than necessary, 3D instead of 2D • Unnecessary pattern, fill, ink, decoration
Never use a pie chart • Percentage of students in each lab group
Never use a pie chart! • There is always a better chart than the pie chart. • Better - easier to read, minimum, maximum percentage found, comparison between groups easier
Use the fewest dimensions possibleFewest dimensions possible • If same width bars the height is read • In general it is the area that matters in histograms and bar charts • 3D has volume which can distract the reader • Note use of bar chart for discrete data, histogram for continuous data
To reveal Centre Spread Outliers Distribution Patterns Anything unusual Comparisons And more
What do you want to see in data? • Information • Meaning • We must turn data into information in order to have meaning
What can we see in data? Location (centre) Spread Shape Outliers Unusual patterns Gaps, clusters How do batches differ
Tools for making meaning from data Ordering data Dot plots & jittered dot plots Stem-and-leaf plots Histograms, Boxplots, Bar charts Pie charts Frequency tables Numerical summaries
Selecting the tool depends on The question asked How the variable is measured The structure of the data Utility of the tool
Homework • Textbook reading Utts & Heckard (2004) Chapter 2 Or • Textbook reading Moore and McCabe pp38-55. Or • Textbook reading, Griffiths, Stirling and Weldon, 1998, Chapters 1, 2, 6 (pp. ) • Complete lab and preparation for next weeks lab.