1 / 54

Descriptive Statistics

Descriptive Statistics. Describing Numerical Data:. Numerical Descriptive Statistics. Numerical Descriptive Statistics. Numerical Summary Measures:. Population Parameters Sample Statistics. Numerical Summary Measures. Measures of Central Tendency:. Mean Median Mode. Mean.

Download Presentation

Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics Describing Numerical Data: Numerical Descriptive Statistics

  2. Numerical Descriptive Statistics Numerical Summary Measures: Population Parameters Sample Statistics

  3. Numerical Summary Measures Measures of Central Tendency: Mean Median Mode

  4. Mean Most common measure of central tendency The arithmetic average Acts as ‘balance point’ Affected by extreme values (‘outliers’)

  5. Mean Formula (sample mean): Let X = sample mean n = sample size Xi = value of measurement i (i=1...n) then

  6. Dataset I: Score = 2726/50 = 54.52

  7. Dataset I: Score 30 40 50 60 70 80 mean = 54.52

  8. Median Measure of central tendency Middle value in an ordered array If odd n, middle value of ordered array If even n, average of 2 middle values Not affected by extreme values

  9. Example 1: 45 56 60 65 72 Median = 60

  10. Example 2: 30 45 56 60 65 72 Median = (56+60)/2 = 58

  11. Dataset I: Score Positions: 1 2 …25 26 … 49 50 Test Scores: 55 55 Median = (55+55)/2 = 55

  12. Dataset I: Score Mean vs Median 30 40 50 60 70 80 mean = 54.52 median = 55

  13. Mean vs Median 40% 30% 15% 10% 5% 30 40 50 60 70 80 medianmean

  14. Mean vs Median 30 40 50 60 70 80 mean median

  15. Mean vs Median 30 40 50 60 70 80 mean = median = 55

  16. Mean vs Median 30 40 50 60 70 80 mean = median = 55

  17. Mode Measure of central tendency Value that occurs most often Not affected by extreme values May be no mode or several modes May be used for numerical & categorical data

  18. Dataset I: Score Mode = 52, 55 (each test score value occurs 4 times)

  19. Dataset I: Score Mean vs Median vs Mode 30 40 50 60 70 80 mode = 52 mean = 54.52 median = 55 = mode

  20. Quantitative Data Discrete Data: finite, countable number of values Continuous Data: infinite number of values

  21. Numerical Summary Measures Measures of Non-Central Location: Percentile Quartiles

  22. Percentiles Divide the data into hundreds Example: 95th percentile has 95% of the data below it & 5% of the data above it

  23. 25% 25% 25% 25% Q1 Q2 Q3 Quartiles Measure of noncentral location Split ordered data into 4 quarters

  24. Quartiles Finding quartiles: Q2 = median of the entire dataset Q1 = median of the first half of dataset Q3 = median of the second half of dataset

  25. Dataset I: Score Dataset I: Score Positions: 1 2 …13…25 26…38… 49 50 Test Scores: 30 40 … 51 …55 55 … 59 … 67 79 55

  26. Quartiles Data set I: ________________________ L Q1 Q2 Q3 H 51 55 59

  27. Five Number Summary Data Set I: 25% 25% 25% 25% _______________________ L Q1 Q2 Q3 H 30 51 55 59 79

  28. Numerical Summary Measures Measures of Dispersion: range interquartile range variance / standard deviation coefficient of variation

  29. Measures of Dispersion Dataset A Dataset B ________ ________ n = 5 n = 5 mean = 40 mean = 40

  30. Measures of Dispersion Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40

  31. Range Measure of dispersion Difference between largest & smallest observations Formula:

  32. Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40 range = 40 range = 2

  33. Interquartile Range Measure of dispersion Also called midspread Spread in middle 50% of data Not affected by extreme values Formula:

  34. Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40 range = 40 range = 2 IQR = 20 IQR = 0

  35. Variance & Standard Deviation Measure of dispersion Most commonly used measures Consider how data are distributed Show variation about mean (X or )

  36. Variance & Standard Deviation Calculation based on: deviation where deviation = Xi - X

  37. Deviations: (Xi - X) Dataset A Dataset B -20 +20 -1 +1 -10 +10 20 30 40 50 60 39 40 41 mean = 40 mean = 40

  38. Sample Variance Formula: n - 1 in denominator! (Use N if population variance)

  39. Sample Standard Deviation

  40. Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________________ ____________________ mean = 40 mean = 40 range = 40 range = 2 IQR = 20 IQR = 0 s² = 250 s² = .5 s = 15.84 s = .707

  41. Interpreting Standard Deviation Empirical Rule¹: mean± 1 s  encompasses about 68% of data mean± 2 s  encompasses about 95% of data mean± 3 s  encompasses about 100% of data ¹ Applies to Mound-shaped distributions only

  42. Example: Dataset C: Mound shaped: mean = 50 and s = 10 20 30 40 50 60 70 80 68% 95% 100%

  43. Dataset 1: Score Score has a bell-shaped distribution, but does it have a true NORMAL probability distribution?  Check the Empirical Rule

  44. Dataset I: Score Score: x = 54.52 s = 7.678 x ± 1s  54.52 ± 1(7.678) 46.84 to 62.20 Q: What percent of test scores fall in this interval?

  45. Dataset I: Score Ans: (40/50) x 100 = 80%  68% Score does not have a true NORMAL probability distribution!

  46. Dataset I: Score Score has a heavy-tailed distribution (also called an outlier-prone distribution)

  47. Bell-shaped distributions Bell-shaped distributions can be: 1. Normal distributions 2. Heavy-tailed distributions, or 3. Light-tailed distributions

  48. Coefficient of Variation Measure of relative dispersion Always a % Shows variation relative to mean Used to compare 2 or more groups Formula (sample):

  49. Stock AB Average Return $50 $12 Standard Dev. $10 $4 Coefficient of Variation Example: Given the following summary information on two stocks, A and B, listed on the American Stock Exchange, which shows more volatililty?

  50. Measures of Skewness & Heavy-tailedness SKEW1 Kurtosis

More Related