470 likes | 473 Views
"A professional development seminar sponsored by the Ann Arbor Chapter of the ASA. Explore data collection, variable types, shapes of distributions, measures of center and spread, and the use of standard deviation to fit data to a normal distribution."
E N D
GAISEing into the Common Core Standards – Day 1 A Professional Development Seminar sponsored by the Ann Arbor Chapter of the ASA
Getting to Know You …Let’s Collect Some Data! • Go around the room and enter data for yourself on the charts. • Men: blue markers • Women: red markers
Let’s think about our data • What are the different types of variables that we measured? • How did you measure each of the variables? • Were any of these hard to measure? • What were the units for each variable? • What might the context be for each of these charts?
Quantitative vs Categorical data • Height (in) • Number of letters in your first name • Number of siblings (not including yourself) • Favorite color • Do you currently have a dog? • How many pets do you currently have? • Travel time to this workshop? (min) • How many years have you been teaching?
Quantitative vs Categorical data Applet: http://mathnstats.com/applets/Categorical-Quantitative.html
Quantitative vs Categorical data • Common Misconceptions: • Histograms vs Bar charts • Don’t discuss shape for bar charts! • Zip code?
Shape • Skewed right(positive)/left(negative)
Measures of Center • What is a typical value in a given situation? • Tallest bar: mode • Middle Value: median • Median: differs for odd and even sample sizes • Show it on your hand!
Measures of Center • Mean: • Add and divide • “fair share” • Pencil activity • Block activity • Glass/beaker activity
Measures of Center Applets for comparing medians and means: http://onlinestatbook.com/stat_sim/descriptive/index.html http://www.stat.tamu.edu/~west/ph/ http://bcs.whfreeman.com/ips4e/cat_010/applets/meanmedian.html
Measures of Center: Misconceptions • When is the mean not a good measure of center? • The mean doesn’t have to be a value in the data set. • The mean number of children per household is 2.5 children!
Why we need measures of Spread? Midterms are returned and the “average” was reported as 76 out of 100. You received a score of 88. How should you feel?
Measures of Spread • Look at the data: discuss spread. • Range = Max – Min = Spread of 100% of data • Interquartile Range = IQR = Q3 – Q1 = Spread of middle 50% of data • Needed for boxplots, it is the length of the box
Measures of Spread • Mean Absolute Deviation • MAD • Average distance of values from the mean • Standard Deviation • Interpretation is similarto MAD
Increasing Spread Consider the following three data sets. I: 20 20 20 II: 18 20 22 III: 17 20 23 (a) Which data set will have the smallest standard deviation? (b) Which data set will have the largest standard deviation? (c) Find the standard deviation for each data set and check your answers to (a) and (b).
Bin Sizes in Histograms Applet: http://www.stat.sc.edu/~west/javahtml/Histogram.html
Back to the Core • CCSS.Math.Content.HSS-ID.A.4Use mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve.
Normally Distributed Data • Bell/Mound Shape • Symmetric • Mean ~ median • Z-scores • Empirical Rule as Frame of Reference • Take them to calculator or table to get probability
Empirical Rule For bell-shaped histograms, approximately … • 68% of values fall within 1 standard deviationof mean in either direction. • 95% of values fall within 2 standard deviations of mean in either direction. • 99.7% of values fall within 3 standard deviations of mean in either direction. A very useful frame of reference!
Exam Scores Scores on final exam have approximately a bell-shaped distributionwith a mean score of 70 pointsand a standard deviation of 10 points. Sketch a picture…
Exam Scores Scores on final exam have approximately a bell-shaped distributionwith a mean score of 70 pointsand a standard deviation of 10 points. Suppose you scored 80 points on the exam. How many standard deviations from the mean is your score?
Standard Score or z-score Empirical Rule (in terms of z-scores) For bell-shaped curves, approximately… • 68% of the values have z-scores between –1 and 1. • 95% of the values have z-scores between –2 and 2. • 99.7% of the values have z-scores between –3 and 3.
Exam Scores Scores on final exam have approximately a bell-shaped distribution with a mean score of 70 points and a standard deviation of 10 points. Suppose Rob’s score was 2 standard deviations above the mean. What was Rob’s score? What can you say about the proportion of students who scored higherthan Rob?
Check for Nonnormal Features • Are these normal? • Why/Why not?
Are you a Good Timer? • Quick Experiment: • Close your eyes • When you here the “START”, begin counting off seconds in your head • When you here the “STOP”, write down the number you reached
Are you a Good Timer? • Come up and Graph the results • What do we see? • Keep your result – we will revisit it later…
Back to the Core Draw informal comparative inferences about two populations. • CCSS.Math.Content.7.SP.B.3Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball team is 10 cm greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either team; on a dot plot, the separation between the two distributions of heights is noticeable. • CCSS.Math.Content.7.SP.B.4 Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. For example, decide whether the words in a chapter of a seventh-grade science book are generally longer than the words in a chapter of a fourth-grade science book. • CCSS.Math.Content.HSS-IC.B.5 Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.
Parallel Graphs • Use ideas from before to compare: • Shape • Center • Spread • Be sure to use same scale!
Parallel Graphs What do you see?
Revisit the Timer Experiment • How else might we explore this data? • What would be some interesting comparisons to make? • Website about parallel plots, you can enter data for 2+ groups and graphs made for you: http://www.physics.csbsju.edu/stats/box2.html
Balancing your Design • Study collects data on which treatment group the subject was assigned, the main response (time to cure), and also other variables like age. • They want to compare the responses for the two treatment groups, but are concerned that age might also be related to the response. • Should check to see that age is balancedfor the two treatment groups before looking for differences in the response by treatment.
Comparing Data: Usefulness of Randomization Study to compare two antibiotics for treating strep throat in children, Amoxicillin and Cefadroxil. At one center, 23 children were randomly assigned to one of two treatment groups. One concern is that age of the child might influence the effectiveness of the antibiotics. The ages of the children in each treatment group are given below. How do the two groups compare with respect to age? Give the five-number summary for each group. Comment on your results. Amoxicillin Group(n=11): 8 9 9 10 10 11 11 12 14 14 17 Five-number summary: Cefadroxil Group(n=12):7 8 9 9 9 10 10 11 12 13 14 16 Five-number summary: Make side-by-side boxplots for the antibiotic study data.
Comparing Data: Usefulness of Randomization ~ Instructor Side Give the five-number summary for each of the two treatment groups. Comment on your results. Amoxicillin Group (n=11): 8 9 9 10 10 11 11 12 14 14 17 Five-number summary: min=8, Q1=9, median=11, Q3=14, max=17 CefadroxilGroup (n=12):7 8 9 9 9 10 10 11 12 13 14 16 Five-number summary: min=7, Q1=9, median=10, Q3=12.5, max=16 How long?10 minutes How might it be done?Ask students to work through this exercise with a partner -- one person can do it for the Amoxicillin data and the other for the Cefadroxil data. Then discuss the results. You could have students start with all 23 children and perform the randomization themselves with a partner. Then each group will have a different answer and the class can see the effect of randomization overall. There may be a group for which the randomization did not do so well -- randomization does not guarantee balancing. How important?Important. It reinforces the concept of why randomization is a useful technique. A complete exercise for comparing two groups and assessing if the researchers need to control for age differences in evaluating the effectiveness of the two antibiotics.
Which is Most Convincing? Study 1 Study 2 Study 3
Is there a Difference? Using Simulations • Background of study here how this is taking many samples of size 10 from same population
Is there a Difference? Using Simulations Experiment: measuring effect of caffeine (0 mg vs 200 mg) and deciding if have differing effects on number of finger taps per minute (2 hours later)
Resampling Applet • Applet for resampling: http://lock5stat.com/statkey/ • We will see more of this on Day 3
Day 1 Wrap Up • What surprised you today? • What did you find interesting? • How might you bring these ideas to your class? • What would you change? • Other activities/ideas to share with the group?