240 likes | 388 Views
A Short Tour of Probability & Statistics. Presented by: Nick Bennett, Grass Roots Consulting & GUTS Josh Thorp, Stigmergic Consulting & GUTS Irene Lee, Santa Fe Institute, GUTS. November 6, 2010 Santa Fe Alliance for Science Professional Enrichment Activity. Outline.
E N D
A Short Tour of Probability & Statistics Presented by: Nick Bennett, Grass Roots Consulting & GUTS Josh Thorp, Stigmergic Consulting & GUTS Irene Lee, Santa Fe Institute, GUTS November 6, 2010 Santa Fe Alliance for Science Professional Enrichment Activity
Outline • Framing the problem (Nick) • Review of Statistics (Irene) • Randomness (Nick) • Dice & Data (Josh) • Problem of Points (Nick) • Crosswalk of Common Core standards (Irene)
What is Statistics? • The science of collection, organization and interpertation of data.
What do Statisticians do? • Data analysis • Probability • Statistical inference
What is a Statistical question? • One that anticipates variability. • Compare • “How old am I?” • “How old are the students in my school?”
Describing Data • Data (plural) are the raw material • Data are the numbers we use to interpret reality • We will look at a few different ways of describing data. • Dot plot • Frequency table • Stem and Leaf diagram
A Sample Data Set • 92 Penn State students’ weights • MALES: • 140, 145, 160, 190, 155, 165, 150, 190, 195, 138, 160, 155, 153, 145, 170, 175, 175, 170, 180, 135, 170, 157, 130, 185, 190, 155, 170, 155, 215, 150, 145, 155, 155, 150, 155,150, 180, 160, 135, 160, 130, 155, 150, 148, 155, 150, 140, 180, 190, 145, 150, 164, 140, 142, 136, 123, 155. • FEMALES: • 140, 120, 130, 138, 121, 125, 116, 145, 150, 112, 125, 130, 120, 130, 131, 120, 118, 125, 135, 125, 118, 122, 115, 102, 115, 150, 110, 116, 108, 95, 125, 133, 110, 150, 108
Dot Plot In a dot plot, one dot per student goes over each student’s reported weight.
Frequency table -> Histogram Divide the number line into intervals and count the number of students weights within each interval. The “frequency” is the count in any given interval. The “relative frequency” is the proportion of weights in each interval.
Histograms • From the frequency table, we can make a bar graph called a histogram. • Each bar covers an interval and is centered at the midpoint. • The height of the bar corresponds with the number of data points in the interval
Stem-and-Leaf Diagram Both summarizes data and shows all data points. • The STEM shows intervals (ranges in tens) • The LEAVES show data points (ranges in ones) • Put the leaves in order • Is there evidence of reporting bias?
Summary Statistics • Central or typical value • Spread about that value
Measures of Center • Mean • Median
The Mean • Given
The Median • The midpoint of the data • If even number of data points, it is the middle • If odd number of data points, average the two data points nearest the middle.
Measures of Spread • Interquartile range • Standard deviation
Interquartile range • Put the data in numerical order • Divide the data set into two equal groups with the median as the center point. • The median of the low group = 1st quartile • The median of the high group = 3rd quartile
Box & Whiskers plot median . 1.5 IQR 1.5 IQR
Standard deviation • Average squared distance = • Sample variance • Standard deviation =
Z-scores, Standardized Scores • A student weighing 175 pounds has a z-score of 1.26
Summary: • Several ways to display data • Measures of Center • Measures of Spread • Standard deviations
Statistical inference • Use random sampling to draw inferences about a population. • Generalizations about a population from a sample are valid only if the sample is representative of that population.
Sampling • With replacement. • Without replacement.