130 likes | 275 Views
AP Statistics Review. Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data. Categorical variables place an individual into a group or category. Organize the data into a frequency table or a Relative frequency (percent) table. Graph data in a bar graphs or pie charts.
E N D
AP Statistics Review Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data
Categorical variables place an individual into a group or category. • Organize the data into a frequency table or a Relative frequency (percent) table. • Graph data in a bar graphs or pie charts. • To use a segmented bar or a pie chart, data must add to 100% of a total – no overlap of categories or categories that don’t constitute a single whole. Analyzing Categorical Data
Two-way tables or contingency tables may be used to compare two categorical variables. • A marginal distribution of a categorical variable is the distribution for the totals of that variable (in the margins of the table). • A conditional distribution of a variable is the distribution for that variable for a specific value of the other variable. • Side-by-side segmented bar graphs showing the conditional distributions of a variable can be used to look for an association of the variables. If there is no association (i.e. the variables are independent of each other) the segmented bar graphs or corresponding relative frequency distributions will be very similar. Comparing Categorical Variables
Titanic Data: • 1st class survived – 197 died – 122 • 2nd class survived – 94 died – 167 • 3rd class survived – 151 died – 476 Example for Categorical Data
Variables: Ticket class, Survival • Marginal Distribution for Survival: 442 survived, 765 died • Conditional Distribution for 1st class survival: 197 survived, 122 died, 319 total • Possible graph: Three segmented bars, one for each class, divided into two colors showing relative survival/death rates • Conclusion: The three bars do not look nearly identical and are not all like the marginal distribution for survival. The relative frequencies of survival and death are different at a level we believe to be statistically significant. There was an association between survival rate and ticket class. Example for Categorical Data
Quantitative variables take numerical values for which taking an average would make sense. Most quantitative variables have units of measurement. • Organize the data into a list. • Graph data using a dot plot, stem and leaf plot, or histogram • Describe the distribution using SOCS (Shape, Outlier/Unusual, Center, Spread) Analyzing Quantitative Data
Dot plot - Use a number line, label axis, give graph a title. • Stem and leaf plot – Stems usually are all but the final digit. The leaf is usually the final digit. You must include a key that shows what the numbers mean. Arrange leaves from least to greatest out from the stem. Do NOT leave out duplicates. May do back-to-back plots for comparisons. If there are many data points in each stem, you can split the stems. Don’t forget title. • Histogram – Divide data into bins of equal width. (like 0 to <5, 5 to <10, etc.) Draw a number line with the bin boundaries. Draw bars to appropriate height to show counts in each bin. Label axes. No gaps between bars unless there is an empty bin. Choose bin width to have a reasonable number of bins (around 5 or so). • Review: How to make histogram on calculator Tips on Graphing
Describing Shape • Is the graph roughly symmetric or is it skewed? Skewed left – long tail to left. • Is the graph unimodal/bimodal/multimodal? (Don’t call a graph multimodal unless you really believe multiple peaks are meaningful and not just random variation). A constant graph is flat – bars all same height. • Most graphs that are unimodal and symmetric are NOT normal. Don’t say it is normal unless it really is! SOCS - Shape
Describing outliers/unusual points • Are there gaps (empty bins)? • Are there any values that are unusually far from the rest? SOCS - Outlier
Describing Center • Which bin would contain the midpoint (median)? • If the data are skewed right, the mean would be above the median. Skewed left, below the median. Symmetric – same place as median SOCS - Center
Describing Spread • Range • More descriptive measures come in C5. SOCS - Spread
If asked to compare/contrast two variables, do NOT just state SOCS for both displays. • You must COMPARE – tell how they are alike • And CONTRAST – tell how they are different • For each part of S O C S • Don’t leave one out – if there’s nothing interesting to say, then say that. SOCS – Compare/Contrast
Examine the 5 W’s to see what you really know about data. • Who – NOT who gathered the data. Who are the subjects the data is ABOUT. This might not be people or living things. • What – what data was gathered – i.e. your variables. Categorical? Quantitative? Units? • When – NOT when the study is published, but when the data occurred. • Where – NOT where the study is published, but where the data occurred. • Why – What is the question the research is trying to answer? • How – how was the data found – Observational study? Experiment? Sampling? Simulation? What is the Scope of Inference? Are there concerns about design? 5 W’s