550 likes | 750 Views
Descriptive Statistics. Describing Numerical Data:. Numerical Descriptive Statistics. Numerical Descriptive Statistics. Numerical Summary Measures:. Population Parameters Sample Statistics. Numerical Summary Measures. Measures of Central Tendency:. Mean Median Mode. Mean.
E N D
Descriptive Statistics Describing Numerical Data: Numerical Descriptive Statistics
Numerical Descriptive Statistics Numerical Summary Measures: Population Parameters Sample Statistics
Numerical Summary Measures Measures of Central Tendency: Mean Median Mode
Mean Most common measure of central tendency The arithmetic average Acts as ‘balance point’ Affected by extreme values (‘outliers’)
Mean Formula (sample mean): Let X = sample mean n = sample size Xi = value of measurement i (i=1...n) then
Dataset I: Score = 2726/50 = 54.52
Dataset I: Score 30 40 50 60 70 80 mean = 54.52
Median Measure of central tendency Middle value in an ordered array If odd n, middle value of ordered array If even n, average of 2 middle values Not affected by extreme values
Example 1: 45 56 60 65 72 Median = 60
Example 2: 30 45 56 60 65 72 Median = (56+60)/2 = 58
Dataset I: Score Positions: 1 2 …25 26 … 49 50 Test Scores: 55 55 Median = (55+55)/2 = 55
Dataset I: Score Mean vs Median 30 40 50 60 70 80 mean = 54.52 median = 55
Mean vs Median 40% 30% 15% 10% 5% 30 40 50 60 70 80 medianmean
Mean vs Median 30 40 50 60 70 80 mean median
Mean vs Median 30 40 50 60 70 80 mean = median = 55
Mean vs Median 30 40 50 60 70 80 mean = median = 55
Mode Measure of central tendency Value that occurs most often Not affected by extreme values May be no mode or several modes May be used for numerical & categorical data
Dataset I: Score Mode = 52, 55 (each test score value occurs 4 times)
Dataset I: Score Mean vs Median vs Mode 30 40 50 60 70 80 mode = 52 mean = 54.52 median = 55 = mode
Quantitative Data Discrete Data: finite, countable number of values Continuous Data: infinite number of values
Numerical Summary Measures Measures of Non-Central Location: Percentile Quartiles
Percentiles Divide the data into hundreds Example: 95th percentile has 95% of the data below it & 5% of the data above it
25% 25% 25% 25% Q1 Q2 Q3 Quartiles Measure of noncentral location Split ordered data into 4 quarters
Quartiles Finding quartiles: Q2 = median of the entire dataset Q1 = median of the first half of dataset Q3 = median of the second half of dataset
Dataset I: Score Dataset I: Score Positions: 1 2 …13…25 26…38… 49 50 Test Scores: 30 40 … 51 …55 55 … 59 … 67 79 55
Quartiles Data set I: ________________________ L Q1 Q2 Q3 H 51 55 59
Five Number Summary Data Set I: 25% 25% 25% 25% _______________________ L Q1 Q2 Q3 H 30 51 55 59 79
Numerical Summary Measures Measures of Dispersion: range interquartile range variance / standard deviation coefficient of variation
Measures of Dispersion Dataset A Dataset B ________ ________ n = 5 n = 5 mean = 40 mean = 40
Measures of Dispersion Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40
Range Measure of dispersion Difference between largest & smallest observations Formula:
Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40 range = 40 range = 2
Interquartile Range Measure of dispersion Also called midspread Spread in middle 50% of data Not affected by extreme values Formula:
Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________ ___________ n = 5 n = 5 mean = 40 mean = 40 range = 40 range = 2 IQR = 20 IQR = 0
Variance & Standard Deviation Measure of dispersion Most commonly used measures Consider how data are distributed Show variation about mean (X or )
Variance & Standard Deviation Calculation based on: deviation where deviation = Xi - X
Deviations: (Xi - X) Dataset A Dataset B -20 +20 -1 +1 -10 +10 20 30 40 50 60 39 40 41 mean = 40 mean = 40
Sample Variance Formula: n - 1 in denominator! (Use N if population variance)
Example: Dataset A Dataset B 20 39 30 40 40 40 50 40 60 41 ___________________ ____________________ mean = 40 mean = 40 range = 40 range = 2 IQR = 20 IQR = 0 s² = 250 s² = .5 s = 15.84 s = .707
Interpreting Standard Deviation Empirical Rule¹: mean± 1 s encompasses about 68% of data mean± 2 s encompasses about 95% of data mean± 3 s encompasses about 100% of data ¹ Applies to Mound-shaped distributions only
Example: Dataset C: Mound shaped: mean = 50 and s = 10 20 30 40 50 60 70 80 68% 95% 100%
Dataset 1: Score Score has a bell-shaped distribution, but does it have a true NORMAL probability distribution? Check the Empirical Rule
Dataset I: Score Score: x = 54.52 s = 7.678 x ± 1s 54.52 ± 1(7.678) 46.84 to 62.20 Q: What percent of test scores fall in this interval?
Dataset I: Score Ans: (40/50) x 100 = 80% 68% Score does not have a true NORMAL probability distribution!
Dataset I: Score Score has a heavy-tailed distribution (also called an outlier-prone distribution)
Bell-shaped distributions Bell-shaped distributions can be: 1. Normal distributions 2. Heavy-tailed distributions, or 3. Light-tailed distributions
Coefficient of Variation Measure of relative dispersion Always a % Shows variation relative to mean Used to compare 2 or more groups Formula (sample):
Stock AB Average Return $50 $12 Standard Dev. $10 $4 Coefficient of Variation Example: Given the following summary information on two stocks, A and B, listed on the American Stock Exchange, which shows more volatililty?
Measures of Skewness & Heavy-tailedness SKEW1 Kurtosis