1 / 31

Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral Sciences PSYC 1900. Lecture 4: The Normal Distribution and Z-Scores. Quick Review of Box-and-Whisker Plots. First find the median location and mdn Find the quartile locations Medians of the upper and lower half of distribution

markku
Download Presentation

Intro to Statistics for the Behavioral Sciences PSYC 1900

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Statistics for the Behavioral SciencesPSYC 1900 Lecture 4: The Normal Distribution and Z-Scores

  2. Quick Review of Box-and-Whisker Plots • First find the median location and mdn • Find the quartile locations • Medians of the upper and lower half of distribution • Quartile location = (mdn location + 1) / 2 • These are termed the “hinges” • Note: drop fractional values of mdn location • Hinges bracket interquartile range (IQR) • Hinges serve as top and bottom of box

  3. Box-and-Whisker Plots • Find the H-spread • Range between two quartiles • Simply the IQR • Area inside box in plot • Draw the whiskers • Lines from hinges to farthest points not more than 1.5 X H-spread • Outliers • Points beyond whiskers • Denoted with asterisks

  4. Stem-and-Leaf Plot Frequency Stem & Leaf 1.00 0 . 1 3.00 0 . 233 4.00 0 . 4445 3.00 0 . 667 1.00 Extremes (>=12) Stem width: 10.00 Each leaf: 1 case(s)

  5. Outlier Detection • One rule of thumb is to classify points as outliers if they are beyond 3 sd’s from the mean. • As we’ll see later in this lecture, that implies that they are very rare occurrences • One problem • Presence of outlier inflates standard deviation • Box-and-Whisker Plot outlier detection is not influenced by this issue. • H-spread “trims” off influence of extreme points

  6. Descriptives With and Without “Outlier” If point is allowed to inflate variance, it will not be considered an outlier. If it is not, it will.

  7. Boxplots to Compare Groups • Useful in providing a quick visual check on group distributions in an experiment. • Mean =3 in all groups

  8. The Normal Distribution • A specific distribution characterized by a bell-shaped form • Much used to calculate probabilities of scores on variables

  9. What’s So Useful About Distributions? • Distributions specify the way scores deviate around a measure of central tendency. • In so doing, they allow us to calculate the probabilities of specific values occurring.

  10. Pie Chart • An example for a nominal scale • Areas “under the curve” provide information on probabilities Most criminals are on probation 70% (.7 prob) that a criminal would be on probation or in jail

  11. More on Distributions & Prob • Same “adding” of areas under curve holds for histograms • If 64 of 289 cases occur within an interval of interest: • 22% of cases have this “score” • Probability of any selected case having this score is .22 • Integrating area under curve provides a probability estimate

  12. Normal Distribution • For continuous variables, we simply connect “tops” of bars to form a curve. • Abscissa: Horizontal Axis • Ordinate: Vertical Axis • Density: Height of curve at a value of X

  13. Normal Distribution • Mathematically defines as: • Pi and e are constants (3.14, 2.72) • When the mean and sd are calculated, the distribution can be drawn and densities at any given points determined.

  14. Normal Distribution • It would be difficult to calculate probabilities/densities for each new sample. • Therefore, we use the standard normal distribution and transform scores on variables to fit it. • A normal distribution with a mean of zero and a sd=1 [N(0,1)].

  15. Distribution Forms • Many processes can be described by a normal distribution, but not all. • Number of meteor strikes, number of supreme court retirements? • Here use Poisson, which is governed by the expected number of occurrences for an interval.

  16. Score Transformations • In order to use the standard normal tables to determine probabilities, we transform scores. • Linear transformations of means do not change the shape of the distribution • If we have a dist with a mean of 50, we need to transform scores so that 50=0 • Take deviations: (X-50) for new point values • Solves problem of getting mean to zero, but what about standard deviation?

  17. Score Transformations • The Standard Normal has a sd = 1 • If we divide all values of a variable by a constant, we divide the standard deviation by that constant • To get a sd=1, we simply divide the mean transformed (i.e., deviation scores) by the sd of the distribution. • If the sd=5, dividing all scores by 5 produces an sd=1

  18. Z-scores and the Standard Normal Distribution • This transformation of raw scores produces z scores • Z scores are interpreted as the number of standard deviation units above or below the mean • Raw score of 7 in a distribution with mean = 10 and sd=2 produces:

  19. Z Score Transformation • A linear transformation • addition, subtraction, multiplication, and/or division by constants • Does not change form of the distribution • Z-scoring or “standardizing” a distribution does not make the distribution a normal one • Shape will be the same, but mean = 0 and sd = 1

  20. Z Score Benefits • Allows us to compare scores collected on different metrics • Each score can be interpreted based on its deviation from the mean with respect to the magnitude of average deviations • Allows us to easily obtain probabilities for specific scores based on a “known” normal distribution density function

  21. Z Score to Probabilities • If we know a z score, we can calculate probabilities attached to it. • Area under the curve is 1.00 • Tabled values of standard normal distribution reflect area from the mean to that value • Note that if distribution shape differs substantially from normal, probability estimates will be incorrect

  22. Z Score to Probabilities • A z=1.00 in the table corresponds to an area of 0.34 • A score between z=0 and z=1 has a probability of occurring of 0.34 • The probability of a score at or below z=1 is: • .50+.34=.84 • The probability of a score higher than z=1 is: • .50-.34=.16; or 1.00-.84=.16 • The probability of a score -1<z<1? • .34+.34=.68 • Distribution is symmetric

  23. Curve Area Applet

  24. Setting Probable Limits for Observations • Many times, it is useful to predict an interval in which a randomly sampled data point will fall. • A randomly sampled individual’s score should fall between X and X’ with 95% certainty. • This implies we’re looking for the area under the curve that covers 95% (cut off 2.5% in each tail)

  25. Setting Probable Limits for Observations • From the table, we can see that a z=1.96 leaves 2.5% remaining in tail.

  26. Setting Probable Limits for Observations • From the table, we can see that a z=1.96 leaves 2.5% remaining in tail. • We simply need to calculate what raw score corresponds to a z=1.96. • Note that here we must know population mean and sd.

  27. Setting Probable Limits for Observations • If mean is 50 and sd=10

  28. Converting Z’s to Other Standard Scores • Standard scores are ones with predetermined means and sd’s • New score = New SD (z) + New Mean • For IQ [N(100,15): • IQ score for z of 1 = 15 (1) + 100 = 115

More Related