1 / 37

Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral Sciences PSYC 1900. Lecture 3: Central Tendency And Dispersion. Measures of Central Tendency. Numerical values that refer to the center of a distribution Used to provide a “best descriptor” of the score for a sample

keenan
Download Presentation

Intro to Statistics for the Behavioral Sciences PSYC 1900

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Statistics for the Behavioral SciencesPSYC 1900 Lecture 3: Central Tendency And Dispersion

  2. Measures of Central Tendency • Numerical values that refer to the center of a distribution • Used to provide a “best descriptor” of the score for a sample • Usefulness or quality of the measure depends on shape of distribution • Mode, Median, and Mean

  3. The Mode • Defined as the most common or frequent score • The value with the highest point on a frequency distribution of a variable • 3,4,1,5,7,1,2,3,1,1,6,1,7,2 • The mode = 1

  4. The Mode • If two adjacent points occur with equal and greatest frequency, the mode can be considered the average of these two. • Mode = 3.5

  5. The Mode • If the two points are not adjacent and equal, the distribution is bimodal. • Of course, binning might result in a single mode by eliminating error/noise. • Bimodal usually means substantially separated

  6. The Median • Score that corresponds to the point at or below which 50% of scores fall • The “middle” number in a ranking of the data • Median Location • Mdn location = (N+1)/2 • If we have 11 numbers, the mdn location is: • (11+1)/2 = 6 • 1,1,2,3,3,3,4,4,5,5,6 • Mdn = 3

  7. The Median • What about: 1,1,2,3,3,3,4,4,5,5,6,6 • Mdn location = (12+1) / 2 = 6.5 • Mdn = 3.5 • When the median location falls between points, the median is defined as the average of those two points.

  8. Median: Histogram vs. Stem and Leaf Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 1 . 00 1.00 2 . 0 3.00 3 . 000 2.00 4 . 00 2.00 5 . 00 2.00 6 . 00 Stem width: 1.00 Each leaf: 1 case(s)

  9. The Mean • The average value • The sum of the scores divided by the number of scores • 2,4,5,9,11 • (2+4+5+9+11)=31; 31/5=6.2

  10. Relations Among Measures of Central Tendency • When the distributions are symmetric, the three measures will generally correspond. • When the distributions are asymmetric, they will often diverge.

  11. The Mode:Advantages & Disadvantages • Mode is the most commonly occurring score. • Always appears in the data; mean and median may not. • Most likely score to occur. • Useful for nominal data; mean and median are not. • When might the mode be useful?

  12. Loaded Dice 11.00 1 . 00000000000 1.00 2 . 0 2.00 3 . 00 3.00 4 . 000 4.00 5 . 0000 5.00 6 . 00000 6.00 7 . 000000 5.00 8 . 00000 4.00 9 . 0000 3.00 10 . 000 2.00 11 . 00 1.00 12 . 0 The mode is your best bet. Median is not the highest probability. Mean does not even occur in sample.

  13. Disadvantages of The Mode • Mode can vary depending on how data are grouped/binned • May not be representative of entire distribution • Loaded Dice Example • Rare events (e.g., most frequent is zero) • Tells us nothing about cause of nonzero events

  14. Advantages & Disadvantagesof the Mean and Median Let me tell you a story . . . . Better known as ALWAYS look at your data distributions

  15. Men, Women, Evolution, & Sex • Is there a gender difference in the number of desired partners? • Evolutionary psychologists say “yes” due to an asymmetry in minimum parental investment needs. • Data appeared to support this

  16. Men, Women, Evolution, & Sex • Mean # partners in next 30 years: • Men = 7.69; Women = 2.78 • You can’t blame men; it’s in there nature! • Yes? No? Any ideas?

  17. Means versus Medians • These folks never considered the form of their data (or did they?) • Without winsorization, men’s mean = 64

  18. Means: Men = 7.69; Women = 2.78 Medians and Modes = 1

  19. Advantages & Disadvantagesof the Mean and Median • Mean is subject to bias by extreme values • May provide a value for central tendency that does not exist in data set • Major benefit is historical use and ability to be manipulated algrebraically • Most mathematical equations depend on it • When assumptions are met, it is quite valid • Median • Not influenced by extreme values (e.g., salaries, home values). • Not as amenable to algebraic manipulation and use.

  20. Measures of Variability/Dispersion • The degree to which individual data points are distributed around the mean • Provide a measure of how representative the mean is of the scores More Representative

  21. Several Measures • Range • Distance from lowest to highest values • 1,2,3,4,4,5,6,7; Range = 7-1 = 6 • Suffers from sensitivity to extremes • 1,2,3,4,4,5,6,7,80; Range = 80-1 = 79 • Interquartile Range • Range of the middle 50% of scores • Less dependent on extreme values • Trimmed samples and statistics

  22. Average Deviation • Conceptually Clear • How far individual scores deviate from the mean on average • Problem is that average deviation from the mean is, be definition, zero • 1,2,3,3,4,5 • Deviations: -2,-1,0,0,1,2 • Average Deviation = 0

  23. The Variance • Solves the problem that deviations sum to zero • Variance is defined as the average of the sum squared deviations about the mean • Squares of negative numbers are positive • Divide by N-1, not N • Sample Variance is used to estimate Population Variance

  24. The Variance Data: 1,2,3,3,4,4,4,5,6 Volunteer?

  25. Standard Deviation • Square root of the variance • Average deviation from the mean • Gets rid of the squared metric

  26. Computational Formulae • Algebraic manipulations are less clear conceptually but easy to use

  27. Mean and Variance as Estimators • These descriptive statistics are used to estimate parameters

  28. Bias in Sample Variance • If we calculated the average squared deviation of the sample (as opposed to dividing by N-1), the variance would be a biased estimate of the population variance. • Bias: A property of a statistic whose long-range average is not equal to the parameter it estimates.

  29. Bias in Sample Variance • Why does using N produce bias? • Expected value is the long range avg. of a statistic over repeated samples.

  30. Applet Example

  31. Multiply by constant: N/N-1

  32. Box-and-Whisker Plots • Graphical representations of dispersion • Quite useful to quickly visualize nature of variability and extreme scores

  33. Box-and-Whisker Plots • First find the median location and mdn • Find the quartile locations • Medians of the upper and lower half of distribution • Quartile location = (mdn location + 1) / 2 • These are termed the “hinges” • Note: drop fractional values of mdn location • Hinges bracket interquartile range (IQR) • Hinges serve as top and bottom of box

  34. Box-and-Whisker Plots • Find the H-spread • Range between two quartiles • Simply the IQR • Area inside box in plot • Draw the whiskers • Lines from hinges to farthest points not more than 1.5 X H-spread • Outliers • Points beyond whiskers • Denoted with asterisks

  35. Box-and-Whisker Plots Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 0 . 11 3.00 0 . 223 3.00 0 . 445 6.00 0 . 667777 3.00 0 . 889 1.00 Extremes (>=15) Stem width: 10.00 Each leaf: 1 case(s)

  36. Example

More Related