400 likes | 603 Views
There are three kinds of lies - lies, damned lies and statistics. ~Benjamin Disraeli, commonly misattributed to Mark Twain. APSTAT PART ONE Exploring and Understanding Data. What is Statistics?. Chapters 1-3. What is Stat?. Book Says: A way of reasoning Collection of tools and methods
E N D
There are three kinds of lies - lies, damned lies and statistics. ~Benjamin Disraeli, commonly misattributed to Mark Twain APSTAT PART ONEExploring and Understanding Data
What is Statistics? Chapters 1-3
What is Stat? • Book Says: • A way of reasoning • Collection of tools and methods • Helps us understand the world • Statistics is about variation
Stat Basics • Individuals • Object described by a set of data • People (#1), cars, animals, groups… • Variables • Categorical (Qualitative)– Usually involves words • Examples: sex, advisor, social security #... • Quantitative – Involve #’s • Examples: age, height, income, test score…
Displaying Categorical Data • Frequency tables:
Displaying Categorical Data • Realtive Frequency tables: • Just roll up the %’s
Displaying Categorical Data • Contingency Table • Two Way table Age at first “Real Kiss” (ahhhhhhhhhhhh…)
Marginal Distribution Age at first “Real Kiss” (ahhhhhhhhhhhh…) • Conditional Distribution: • % of males whose first kiss came when they were 10-14 • % of 20-24 year old first kissers who were male
The Rest of Chapters 1-3 • Displaying the data • Pie Charts • Bar Charts • Blah Blah Blah…. • Simpson’s Paradox – AP MC • Being Skeptical – Important for real life • 5 W’s + 1H • Ex: 4 out of 5 dentists…. • Displaying data • Lies, Dammed Lies, and Statistics
Showing Off Your Data Chapters 4-5
Histograms • Remember bar graphs? Same, but different. • Think of sorting boxes… • Same size boxes • ON TI-83 • Enter Data into L1 (STAT>EDIT) • Go to STAT PLOT (2ND Y=) • Change Options • Go to ZOOM Choose Stat OR Go to WINDOW Change Options Go to GRAPH
Histograms • Make a histogram of the following data: • Age of Teachers At WPS 25, 34, 37, 42, 51, 43, 49, 35, 37, 65,
Outliers • An observation that is outside the pattern • For example, ages in this classroom 16, 17, 16, 17, 18, 17, 17, 16, 18, 36 • Formula to determine (l8r, sk8r) • For now “potential” or “possible” outlier
Center Mean - Average Median - Middle Shape Symmetric Skewed Uniform Bell Shaped Bi- or Multi-modal Spread Standard Deviation Range IQR Weird-ness Outliers Gaps Describing a distribution
Stemplots • Basic • Split Stems • Back-To-Back
Basic Stemplot • Boys Weight in class (pounds) 10 11 12 13 14 15 16 17 18 3 4 6 9 9 0 2 5 7 8 8 0 0 1 3 4 4 5 8 9 1 9 KEY: 10 8 = 108 pounds
Split Stem Stemplot • Boys Weight in class (pounds) 3 4 6 9 9 0 2 5 7 8 8 0 0 1 3 4 4 5 8 9 1 9 14 14 15 15 16 16 17 17 18 KEY: 10 8 = 108 pounds
Back to Back Stemplot • Girls vs. Boys Weight in class (pounds) 10 11 12 13 14 15 16 17 18 8 9 3 8 7 7 3 9 4 0 2 1 3 4 6 9 9 0 2 5 7 8 8 0 0 1 3 4 4 5 8 9 1 9 KEY: 10 8 or 8 10 = 108 pounds
Mean • Average! Add ‘em up and divide by n • Sample Mean denoted as x (x-bar) • Not Resistant to extreme measures • ie. Ages in Mrs. Smith’s Kindergarten Class • 4,5,4,4,4,5,5,4,4,4,5,5,4,4,5,39
Median • Middle! Line ‘em up (in order) and find the middle. If two share it, find their mean. • Resistant to extreme measures • ie. Ages in Mrs. Smith’s Kindergarten Class • 4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,39
Quartiles • Median cuts data in half, Quartiles cut the Halves in Half! Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 Median 3rd Quartile Q3 1st Quartile Q1
5-Number Summary • Low-Q1-Median-Q3-High • Shows Spread of Data Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • 5-Number Summary: 25 35 39.5 49 65
Boxplot • Graphical Representation of 5-Number Summary • Shows Shape, Spread, and Center • Always draw to scale: 25 35 39.5 49 65
Outliers • First off, IQR – InterQuartile Range • Distance between Quartiles… Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • IQR is 49-35=14 • Outlier is anything 1.5 times IQR below Q1 or above Q3 • Sooo…. An outlier would have to be 21 below 35 or 21 above 49…Below 14 or above 70. Nothing in our data is an outlier!
Boxplot Using TI-83 Enter Teacher Ages into L1 (clear old stuff first): 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • ON TI-83 • Go to STAT PLOT (2ND Y=) • Change Options • Go to ZOOM Choose Stat OR Go to WINDOW Change Options Go to GRAPH
Variance & Standard Deviation • Variance - s2 • Average of Squared distances from mean • In example 26/5 = 5.2 • Standard Deviation – s • Square Root of Variance • In example, about 2.28 • Standard Deviation • Measure of Spread • Use with Mean • Non-Resistant • On TI-83 Now….. STAT>CALC-1VARSTAT Mean = 6
It’s Normal to Deviate Chapter 6 – The Normal Model
Mean, Median and Mode Density Curve • Area under a density curve is always 1 • Symmetric density curve:
Mean Mode Mean Skewed to the Left (tail trails to the left) Skewed to the Right (tail trails to the right) Median Density Curve Continued • Density curves are often skewed • Recall Median is “resistant” while Mean is not
50% of Population 50% of Population Histograms • Median is “equal areas” point • Mean is “balance point” – “think Physics”
Concave Down Concave Up Concave Up + Normal Distributions (bell shaped) • Center is mean m –(population mean) • Spread is Standard Deviation s – (population standard deviation) • To find, look for inflection points
Raw-Score (X) 2 3 1 + 1 + 2 + 3 z-Score (z) 3 2 1 0 1 2 3 68 – 95 – 99.7 Rule • Also called EMPIRICAL RULE Probability = 99.7% within 3 Probability = 95% within 2 Probability = 68% within 1
Percentiles (and quartiles) • Think standardized tests or class rankings • Percent of observations to the LEFT of an observation • Quartiles: • First is at 25th percentile • Median is at 50th percentile • Third is at 75th percentile
Raw-Score (X) 2 3 1 + 1 + 2 + 3 z-Score (z) 3 2 1 0 1 2 3 Z-SCORE • Number of Standard Deviations (s) away from the Mean (m)
Z-SCORE Continued • Example, You have an IQ of 148 The IQ test you took has a distribution N(105, 20). What is your Z-Score? What does this mean? • = population mean = population standard deviation, X = Raw-Score, z = z-Score • Normal Distribution Notation N (, )
Using Tables • Ex. – Your IQ Z-SCORE was 2.15. What does it mean now?
Using Tables • Ex. – If someone’s IQ was at the 10th percentile, what would their Z-SCORE be?
Using TI-83 • Normalcdf (Xlower, Xupper, , ) : - use to convert Raw-Score directly to probability. • Normalcdf (Zlower, Zupper) : - use to convert z-Score to probability ***For Graphics use Shadenorm (GTANG notes)
Using TI-83 • Test Empirical Rule (68-95-99.7) • Find Normalcdf(-1,1), Normalcdf(-2,2), Normalcdf(-3,3) • Ex. What percent of IQ Scores would fall between 100 and 110 Using N(105, 20)? What percent would be above 150? • Normalcdf(100,110,105,20) • Normalcdf(150,1000000000,105,20)
Normality • Just check Box and Whisker plot or Histogram on TI-83 • ALWAYS do this if raw data is given • Sketch result and comment on it!