1 / 38

Quantitative Variables: Plots, Measures, Outliers & Skewness

Learn to recognize and interpret plots (histogram, boxplot), calculate and interpret measures of center (mean, median), and understand the effects of outliers and skewness.

joelgarcia
Download Presentation

Quantitative Variables: Plots, Measures, Outliers & Skewness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 200 Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6

  2. Motivating example • A group of students was randomly assigned to one of two classes. One class was taught by teacher A and the other by teacher B. At the end of the semester, all students took the same exam. • Investigate whether there is any difference in exam scores between the two teachers. 53 72 35 47 64 66 13 6 35 42 45 59 58 69 53 67 57 53 62 95 74 2 61 84 88 65 69 76 53 71 71 87 98 83 81 73 75

  3. Summarizing Quantitative Variables • The distributionof a quantitative variable is the overall pattern of how often the possible values occur. • Four key aspects of the distribution are: • Location: center, average • Spread: variability • Shape: symmetric, bell, skew • outliers • Let’s begin with the shape, which is best seen with a visual summary

  4. Visual summaries for quantitative variables Histogram Boxplot • A chart of the data that shows how many observations are in each equally spaced interval. • Usually use 6-15 intervals • Can use frequency or relative frequency

  5. Histograms Teacher A Scores Teacher B Scores

  6. Outlier An individual value that is unusual compared to the bulk of the other values. Outlier!

  7. Example When considering study hours/week, what percent of the students spend: at most 3 hours? at least 11 hours? between 5 and 9 hours?

  8. Shapes of distributions • Symmetric the shape of the data is similar on both sides of the center. • Bell-shaped is a special case of symmetric • Skewed:Values are more spread out on one side than the other. • Left-skewed: lower values more spread out than higher values • Right-skewed: higher values more spread out than lower values.

  9. Shape Examples: Question: What is the fastest you have ever driven a car? Symmetric

  10. Shape Examples: Question: How many coins are you carrying? Right-skewed Left-skewed Question: What is your grade point average?

  11. Breakdown of DescriptiveStatistical Methods: Quantitative Data did one: histogram do now

  12. Quantitative Data: Measures of Center Mean: • ___________ of all numbers • symbol for sample mean: • Value is sensitive to ______________ Median: • middle observation of ___________ data • value is resistant to ________________ Mode: • observation that occurs most frequently • don’t really use in this course Average Outliers ordered outliers

  13. Example: Center and outliers

  14. Sensitive vs. Resistant statistics • Calculated using ALL observations • Affected by skewness and / or unusual observations. • Example: Mean Sensitive Statistic • Calculated using only some observations • Not affected much by outliers • Example: Median Resistant Statistic

  15. Examples: mean = 94.8 mph median =95 mph mean = 17.3 coins median = 9 coins

  16. Work together question: Which is most likely true when considering salaries($) in a company that employs: 1. 20 factory workers and 2 very highly paid executives: one would find with the salaries that the: • mean > median • mean < median • mean ≈ median 2. 2 factory workers and 20 very highly paid executives: one would find with the salaries that the: • mean > median • mean < median • mean ≈ median

  17. A percentile tells us how much of the data is below a specific value. Percentiles What is the value (in studyhrs/week) for the: • 5th percentile? • 90th percentile?

  18. Percentiles of Interest 25th percentile: • ___________Quartile (QL) • ___________Quartile (Q1) Lower First 50th percentile: • Second Quartile (Q2) • ________ Median 75th percentile: • __________Quartile (QU) • __________ Quartile (Q3) Upper Third

  19. We use quartiles for the… • Numerical method for summarizing quantitative data.

  20. Example: 5-Number summary Descriptive Statistics: Fastest_Speed Variable N Minimum Q1 Median Q3 Maximum Fastest_Speed 20 45 90 95 100 135 Fill-in the five number summary

  21. Another look: 5-number summary The 5-number summary divides your data into 4 quarters:

  22. Approximately what percent of the fastest speeds: • are at least 100 mph? • are at most 90 mph?

  23. Approximately what percent of the fastest speeds lie: • between 90 and 100 mph? • (at most 95) or (at least 100?) 45 100 90 95 135

  24. Visual summaries for quantitative variables Histogram Boxplot • Visualization of the 5-number summary • Shows Q1, Median, Q3 • as lines around and through a middle box. • Identifies outliers. • A chart of the data that shows how many observations are in each equally spaced interval. • Usually use 6-15 intervals • Can use frequency or relative frequency

  25. Boxplots: Examples

  26. Boxplot shows same shape as histogram Symmetric

  27. Boxplot shows same shape as histogram Right-skewed

  28. Boxplot shows same shape as histogram Left-skewed

  29. Link measures of center to shape

  30. Another example: Parties per month Outliers!

  31. Parties per month, without the outliers

  32. Median: 50% of students surveyed partied less than 4.5 times per month. • Right-skewed  mean > median

  33. Consider the variables Party and Year Response • How many parties do you attend in a month? • What year are you in school? Explanatory

  34. Consider the variables Party and Year • How many parties do you attend in a month? • What year are you in school? Quantitative Categorical (ordinal)

  35. Explore relationship with boxplot

  36. Which year has highest median? Largest box? Most outliers? Do we observe a trend?

  37. Review: If you understood today’s lecture, you should be able to solve • 2.40, 2.41, 2.43, 2.49c, 2.53, 2.59, 2.62 Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness

More Related