Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral SciencesPSYC 1900 Lecture 4: The Normal Distribution and Z-Scores

Quick Review of Box-and-Whisker Plots • First find the median location and mdn • Find the quartile locations • Medians of the upper and lower half of distribution • Quartile location = (mdn location + 1) / 2 • These are termed the “hinges” • Note: drop fractional values of mdn location • Hinges bracket interquartile range (IQR) • Hinges serve as top and bottom of box

Box-and-Whisker Plots • Find the H-spread • Range between two quartiles • Simply the IQR • Area inside box in plot • Draw the whiskers • Lines from hinges to farthest points not more than 1.5 X H-spread • Outliers • Points beyond whiskers • Denoted with asterisks

Stem-and-Leaf Plot Frequency Stem & Leaf 1.00 0 . 1 3.00 0 . 233 4.00 0 . 4445 3.00 0 . 667 1.00 Extremes (>=12) Stem width: 10.00 Each leaf: 1 case(s)

Outlier Detection • One rule of thumb is to classify points as outliers if they are beyond 3 sd’s from the mean. • As we’ll see later in this lecture, that implies that they are very rare occurrences • One problem • Presence of outlier inflates standard deviation • Box-and-Whisker Plot outlier detection is not influenced by this issue. • H-spread “trims” off influence of extreme points

Descriptives With and Without “Outlier” If point is allowed to inflate variance, it will not be considered an outlier. If it is not, it will.

Boxplots to Compare Groups • Useful in providing a quick visual check on group distributions in an experiment. • Mean =3 in all groups

The Normal Distribution • A specific distribution characterized by a bell-shaped form • Much used to calculate probabilities of scores on variables

What’s So Useful About Distributions? • Distributions specify the way scores deviate around a measure of central tendency. • In so doing, they allow us to calculate the probabilities of specific values occurring.

Pie Chart • An example for a nominal scale • Areas “under the curve” provide information on probabilities Most criminals are on probation 70% (.7 prob) that a criminal would be on probation or in jail

More on Distributions & Prob • Same “adding” of areas under curve holds for histograms • If 64 of 289 cases occur within an interval of interest: • 22% of cases have this “score” • Probability of any selected case having this score is .22 • Integrating area under curve provides a probability estimate

Normal Distribution • For continuous variables, we simply connect “tops” of bars to form a curve. • Abscissa: Horizontal Axis • Ordinate: Vertical Axis • Density: Height of curve at a value of X

Normal Distribution • Mathematically defines as: • Pi and e are constants (3.14, 2.72) • When the mean and sd are calculated, the distribution can be drawn and densities at any given points determined.

Normal Distribution • It would be difficult to calculate probabilities/densities for each new sample. • Therefore, we use the standard normal distribution and transform scores on variables to fit it. • A normal distribution with a mean of zero and a sd=1 [N(0,1)].

Distribution Forms • Many processes can be described by a normal distribution, but not all. • Number of meteor strikes, number of supreme court retirements? • Here use Poisson, which is governed by the expected number of occurrences for an interval.

Score Transformations • In order to use the standard normal tables to determine probabilities, we transform scores. • Linear transformations of means do not change the shape of the distribution • If we have a dist with a mean of 50, we need to transform scores so that 50=0 • Take deviations: (X-50) for new point values • Solves problem of getting mean to zero, but what about standard deviation?

Score Transformations • The Standard Normal has a sd = 1 • If we divide all values of a variable by a constant, we divide the standard deviation by that constant • To get a sd=1, we simply divide the mean transformed (i.e., deviation scores) by the sd of the distribution. • If the sd=5, dividing all scores by 5 produces an sd=1

Z-scores and the Standard Normal Distribution • This transformation of raw scores produces z scores • Z scores are interpreted as the number of standard deviation units above or below the mean • Raw score of 7 in a distribution with mean = 10 and sd=2 produces:

Z Score Transformation • A linear transformation • addition, subtraction, multiplication, and/or division by constants • Does not change form of the distribution • Z-scoring or “standardizing” a distribution does not make the distribution a normal one • Shape will be the same, but mean = 0 and sd = 1

Z Score Benefits • Allows us to compare scores collected on different metrics • Each score can be interpreted based on its deviation from the mean with respect to the magnitude of average deviations • Allows us to easily obtain probabilities for specific scores based on a “known” normal distribution density function

Z Score to Probabilities • If we know a z score, we can calculate probabilities attached to it. • Area under the curve is 1.00 • Tabled values of standard normal distribution reflect area from the mean to that value • Note that if distribution shape differs substantially from normal, probability estimates will be incorrect

Z Score to Probabilities • A z=1.00 in the table corresponds to an area of 0.34 • A score between z=0 and z=1 has a probability of occurring of 0.34 • The probability of a score at or below z=1 is: • .50+.34=.84 • The probability of a score higher than z=1 is: • .50-.34=.16; or 1.00-.84=.16 • The probability of a score -1<z<1? • .34+.34=.68 • Distribution is symmetric

Curve Area Applet

Setting Probable Limits for Observations • Many times, it is useful to predict an interval in which a randomly sampled data point will fall. • A randomly sampled individual’s score should fall between X and X’ with 95% certainty. • This implies we’re looking for the area under the curve that covers 95% (cut off 2.5% in each tail)

Setting Probable Limits for Observations • From the table, we can see that a z=1.96 leaves 2.5% remaining in tail.

Setting Probable Limits for Observations • From the table, we can see that a z=1.96 leaves 2.5% remaining in tail. • We simply need to calculate what raw score corresponds to a z=1.96. • Note that here we must know population mean and sd.

Setting Probable Limits for Observations • If mean is 50 and sd=10

Converting Z’s to Other Standard Scores • Standard scores are ones with predetermined means and sd’s • New score = New SD (z) + New Mean • For IQ [N(100,15): • IQ score for z of 1 = 15 (1) + 100 = 115

Intro to Statistics for the Behavioral Sciences PSYC 1900