Descriptive Statistics

Descriptive Statistics Measures of Central Tendency

Why? What? And How • Remember, data reduction is key • Are the scores generally high or generally low? • Where the center of the distribution tends to be located • Three measures of central tendency • Mode • Median • Mean • Which one you report is related to the scale of measurement and the shape of the distribution

Mode • The most frequently occurring score • Look at the simple frequency of each score • Unimodal or bimodal • Report mode when using nominal scale, the most frequently occurring category • If you have a rectangular distribution do not report the mode

Median • Score at the 50th percentile (Mdn) • If normal distribution the mdn is the same as the mode • Arrange scores from lowest to highest, if odd number of scores the mdn is the one in the middle, if even number of scores then average the two scores in the middle • Used when have ordinal scale and when the distribution is skewed

Mean • Score at the exact mathematical center of distribution (average) • M = X/N • Used with interval and ratio scales, and when have a symmetrical and unimodal distribution • Not accurate when distribution is skewed because it is pulled towards the tail

Deviations around the Mean • The score minus the mean • Include plus or minus sign • Sum of deviations of the mean always equals zero • (X-M)

Uses of the Mean • Describes scores • Deviation of mean gives us the error of our estimate of the score, with total error equal to zero • Predict scores • Describe a scores location • Describe the population mean () which is a parameter • Typically estimate 

Summarizing Results • Used in all research methods including observational, survey, correlational, and experimental • Compute the mean of the dependent variable for each of the conditions or levels of the independent variable • Mean dependent score changes as function of changes in the IV • Graphing the results using line or bar graphs

Measures of Variability • Extent to which the scores differ from each other or how spread out the scores are • Tells us how accurately the measure of central tendency describes the distribution • Shape of the distribution

Why do we care about variability? Where would you rather vacation, Gulfside Bungalows, where the mean temperature is 70 degrees, or Kalahari Condos where the mean temperature is 70 degrees? Gulfside temperature range: day = 72 night = 68 Kalahari temperature range: day = 110 night = 30 Also variability in terms of the range of temperature at each of these places over the years that temperature has been documented

Range • Can report the lowest and highest value • Or report the maximum difference between the lowest and highest • Semi-interquartile range used with the median: one half the distance between the scores at the 25th and 75th percentile

Variance and Standard Deviation • Definitional and computational formulas (remember order of operations) • Again, most psychological research uses interval and ratio scales of measurement and assume a normal distribution • Goal is to assess the “average” or typical amount the scores differ from the mean • Biased estimates of the population variance

Sample Variance • Uses the deviation from the mean • Remember, the sum of the deviations always equals zero, so you have to square each of the deviations • S2X = sum of squared deviations divided by the number of scores (p. 107 and 108) • Provides information about the relative variability

Some Limits • It isn’t the average deviation • Interpretation doesn’t make sense because: • Number is too large • And it is a squared value

So,… Standard Deviation • Take the square root of the variance • P.109 and 110 • SX • Uses the same units of measurement as the raw scores • How much scores deviate below and above the mean

The standard deviation What is a standard deviation (in English)? the mean of deviations from the mean (sort of) What is: σ (lowercase sigma) is the population standard deviation. S the sample standard deviation (s-hat) is the sample estimate of σ

The deviation (definitional) formula for the population standard deviation • The larger the standard deviation the more variability there is in the scores • The standard deviation is somewhat less sensitive to extreme outliers than the range (as N increases)

The deviation (definitional) formula for the sample standard deviation What’s the difference between this formula and the population standard deviation? In the first case, all the Xs represent the entire population. In the second case, the Xs represent a sample.

X 21 25 24 30 -5.8 33.64 34 -1.8 3.24 -2.8 7.84 x = 0? 3.2 10.24 7.2 51.84 Standard Deviation: Example 26.8 106.8

Calculating S using the raw-score formula To calculate ΣX2 you square all the scores first and then sum them To calculate (ΣX)2 you sum all the scores first and then square them

441 625 576 900 1151 The raw-score formula: example X = 134 X2 = 3698

Estimating the population standard deviation from a sample S, the sample standard, is usually a little smaller than the population standard deviation. Why? The sample mean minimizes the sum of squared deviations (SS). Therefore, if the sample mean differs at all from the population mean, then the SS from the sample will be an understimate of the SS from the population Therefore, statisticians alter the formula of the sample standard deviation by subtracting 1 from N

Population Variance and Standard Deviation • When we have data from the entire population we use  to compute X using the same formulas (p. 115) • We usually need to estimate • Variance and standard deviations of the sample are biased estimates of the population • Limited in terms of how free the scores can vary • Not all of the deviations in the sample are free to be random

Estimates of Population Variability • P. 117, 118, and 119 • Symbol s2X andsX or s-hat -- estimations • Correction factor N-1 • Not all of the deviations in the sample are free to be random • Degrees of freedom df • With M = 6 and scores of 1,5,7,and 9, then the only possibility is for the score to be 8 • More accurate estimate of population variability

Formulas for s-hat (estimate) Definitional formula: Raw-score formula:

The Estimate of the Variance Remember what the variance is….. The standard deviation squares, or the number that you took the square root of to get the standard deviation The variance is not a very useful descriptive statistic, but it is very important value you will use in other techniques (e.g., the analysis of variance or ANOVA)

Sum up… • Assuming a normal distribution • Sample mean is a good estimate of population mean • The estimate of the population variance and standard deviation tells us how spread out the scores are • 68% of the scores are within +1 and –1 sX

Application to Normal Distribution • Knowing the standard deviation you can describe your sample more accurately • Look at the inflection points of the distribution

Transformations • Adding or subtracting just shifts the distribution, without changing the variation (variance) • Multiplying or dividing changes the variability, but it is a multiple of the transformation

Variance is Error in Predictions The larger the variability, the larger the differences between the mean and the scores, so the larger the error when we use the mean to predict the scores • Error or error variance: average error between the predicted mean score and the actual raw scores • Same for the population: estimate of population variance

Summarizing Research Using Variability • Remember, the standard deviation is most often the measure of variability reported • The more consistent the scores are (i.e., the smaller the variance), the stronger the relationship

Proportion of Variance Accounted For • Objective approach: compute “proportion of variance accounted for” • Can compute the overall mean and standard deviation, not taking into consideration the relationship with the levels of the IV • It is the largest error we would accept • When look at relationship we compute variance for each condition and average

Computation • Subtract the average error from the each of the conditions from the error of the total sample • Divide that difference into the error from the total sample • Gives proportion of error accounted for by the levels of the IV

Thus, …. • Proportional improvement in predictions by using a relationship • The stronger and more consistent the relationship, the greater proportion of variance we can account for

Descriptive Statistics