The Data Analysis Plan

The Data Analysis Plan

The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against other interpretations, and supports conclusions.

Three Steps of the Data Analysis Plan 1) Getting to know the data- A first step is to examine the data set, the “raw numbers”. “Play” with the raw numbers. • Summarize the data- Use descriptive statistics to “summarize” the data. 3) Confirm what the data reveal- Most commonly, using null hypothesis significance testing, NHST.

Step 1: Getting to know the data • Look at raw numbers and check for errors and outliers. • Errors are impossible numbers (outside the possible range). • Outliersare in the possible range, but exceptional. Could be an error or a true score from an unusual participant.

Decision Rule for Errors and Outliers • You “fix” errors if you can • You eliminate outliers (if appropriate). Follow the rules of the journal or organization where the results will be presented/reported. • Either way, you must specify the amount of data eliminated and your reason or “rule” for elimination.

2)Look at a “picture” of raw numbers • Stem & Leaf Plots • Histogram (frequency distribution) • Examine underlying distribution of raw scores looking for “unusual” distribution (other than “normal”)

“Normal” Distribution

Skewed Distributions • Skew Distribution: If extremely skewed, you may have to transform the scores (For example, using logarithms or changing the scale you use)

Positive Skew- tail trails off to the “positive” side

Negative Skew: Tail trails off to the “negative” side

“Bi-Modal Distribution” (or multi-modal) Very problematic for further analysis- refer to “experts” for appropriate data analysis.

Step two: Summarize the Data (Descriptive Statistics) • Purpose– to describe the data To indicate what is a typical score (central tendency) To asses the degree to which the scores in the data set differ from one and another (variability or dispersion)

Measures of Central Tendency (tendency toward the middle) typical score • Mode- Most frequently occurring score Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12 Mode= • Median – the “middle” score (50% of scores below and 50% of scores above) Median (from above) =

Mean - Arithmetic average or mean (sum of scores divided by number of scores) Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12 Mean= 7.363 In a “Normal Distribution”: Mean=Median=Mode

Mean=Median=Mode

Skewed distribution

If a distribution is “Skew”, the mean may not be the best descriptor of the typical score. In this case, the median is a better estimate of “typical score”. Usually report BOTH mean and median if the distribution is skewed.

Measures of Variability (dispersion, how different the numbers are from each other) • Range – Officially= (highest score – lowest score) Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12 Range= Usually reported by citing the lowest and highest score in the data set.

Variance–The sum of squared deviations of the scores around the mean divided by either: “N” or “n-1” ???

Variance of a set of #’s or the population variance (the sum of squares (SS) divided by N): An estimate of the variance of a population based on a sample (the sum of the squares (SS) divided by n-1):

Standard Deviation – The square root of the variance.

Effect Size or Effect Magnitude • An index of the strength of the relationship between the IV and the DV that is independent of sample size. • How large an effect does the IV have on the DV?

Cohen’s d is one measure of “size of effect” or effect magnitude. • For d, a value of .20 indicates a small magnitude effect, .50 a medium magnitude effect, and .80 a large magnitude of effect.

d is a ratio of the difference between the means at two levels of an IV divided by the standard deviation of the population. (the difference between means divided by a measure of variability or dispersion) • As variability increases (standard deviation increases), d decreases (lower effect size).

Example • suppose you have two levels of an IV and the means for these two levels are 8 and 5 • The difference between the two means is 8-5=3 • If there moderate variability in the DV (say population standard deviation=6), then: d=3/6 = .50 a medium effect size or magnitude

If the variability of DV is larger (say population standard deviation=15), then: d=3/15=1/5= .20 a small effect size or magnitude

If the variability in DV is really small (standard deviation=3.75) then: d=3/3.75=.80 a large effect size or magnitude

Effect size is one measure that affects the “power” of a statistical analysis and it is used in making decisions about how large a sample size should be used in order to be sufficient to produce a reasonable level of “power”.

Because the standard deviation is used as the “denominator” for this measure, it is independent of (not affected by) sample size. Thus, you can compare effect sizes across research studies using various sample sizes. This type of comparison is called a “Meta-analysis”

The Data Analysis Plan