1 / 44

Statistics: an introduction

viveka
Download Presentation

Statistics: an introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Psychology 242, Dr. McKirnan Statistics: an introduction

    2. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    3. Psychology 242, Dr. McKirnan Statistics introduction 1 Why Use Numbers in Science? Comparability of research outcomes across groups or conditions, within study across studies Statistical operations effect sizes statistical comparisons & probability judgments

    4. Psychology 242, Dr. McKirnan Statistics introduction 1 Costs and & benefits of numbers Virtues of numerical data: Representing data by numbers helps clarify and simplify communication Can show general trend or “common denominator” in the data Danger of numerical data: Can over-simplify / ignore individual differences Liable to misrepresentation or manipulation, as per Broder’s article…

    5. Psychology 242, Dr. McKirnan Statistics introduction 1 The danger of reducing complex social processes to simple numbers:

    6. Psychology 242, Dr. McKirnan Statistics introduction 1 Political (mis)uses of scientific data

    7. Psychology 242, Dr. McKirnan Statistics introduction 1 Thus… Numerical representations of nature are important to scientific progress Key for operational definitions Provide powerful analytic & communication tools Understanding ANY statistical representation requires understanding how it was derived Understanding must involve the context and origin of statistical information ? see examples below. Statistical statements can be literally / technically “true” but misleading. Political (mis)use of numerical / statistical information is common and problematic.

    8. Psychology 242, Dr. McKirnan Statistics introduction 1 Some basic terms:

    9. Psychology 242, Dr. McKirnan Week 3; Experimental designs Research questions, hypotheses & designs

    10. Psychology 242, Dr. McKirnan Statistics introduction 1 Interval Arbitrary or relative ? scales Common in behavioral research, e.g., attitude or rating scales. Ordinal Rank order with non-equal intervals Simple finish place, rank in organization, most, 2nd most, 3rd most... Categorical Categories only Typical of inherent categories: ethnic group, gender, zip code Types of numerical scales

    11. Psychology 242, Dr. McKirnan Statistics introduction 1 Scale details Ratio Scales, e.g., batting average Each scale value describes a physical reality % hits / at bats The zero point is grounded in a physical property 0 hits is meaningful Each scale point is absolute 32o has an absolute meaning, not just relative to other temperatures. Each interval is continuous & exactly equal: .100 ? .150 = .200 ? .250 Used for correlational designs, or as dependent variable in experiments.

    12. Interval scales Psychology 242, Dr. McKirnan Statistics introduction 1

    13. Interval scales Psychology 242, Dr. McKirnan Statistics introduction 1

    14. Interval scales Psychology 242, Dr. McKirnan Statistics introduction 1

    15. Psychology 242, Dr. McKirnan Statistics introduction 1 Two central ways of using numbers. Descriptive Statistics: Simple quantitative description or summary. Batting average in baseball Grade-point average Univariate Analysis: examines cases in terms of a single variable. Frequency distributions group data into categories. E.g., drug use by Age categories, Ethnic groups… Inferential Statistics: Conduct analyses on samples Compare groups (experimental v. control…) Characterize a sample in epidemiological research Use statistical operations to generalize the results to a population.

    16. What is a Frequency Distribution? Psychology 242, Dr. McKirnan Statistics introduction 1

    17. Psychology 242, Dr. McKirnan Statistics introduction 1 How angry are you at the crooks who crashed the economy?

    18. Descriptive statistics example Psychology 242, Dr. McKirnan Statistics introduction 1

    19. We can “block” frequencies by, e.g., gender Psychology 242, Dr. McKirnan Statistics introduction 1

    20. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    21. Central tendency Psychology 242, Dr. McKirnan Statistics introduction 1

    22. What is Central Tendency (Mean or average…) Psychology 242, Dr. McKirnan Statistics introduction 1

    23. Psychology 242, Dr. McKirnan Statistics introduction 1 Describing data 1. Central tendency or general “drift” of the scores. Mode ? most common score Median ? middle of the distribution Mean ? average score 2. Variance: how diverse the scores are (how much vary from each other). Range ? …from the highest to lowest score Standard ? “average” amount the scores vary deviation from the Mean score

    24. Psychology 242, Dr. McKirnan Statistics introduction 1 The Mode Example: scores = 15, 20, 21, 20, 36,15, 25,15,12 Show scores as a frequency distribution

    25. Psychology 242, Dr. McKirnan Statistics introduction 1 Median Mid-point of a distribution of scores: half are above, half are below. List scores in numerical order (interval or ratio scale) Locate the score in the center of the sample. First line up the scores: 12,15,15,15,20,20,21,25,36 The middle (5th out of 9) score = 20. If there are an even number of scores, Median = mean of the two middle scores

    26. Psychology 242, Dr. McKirnan Statistics introduction 1 Mean (M) Most common measure of central tendency Total all scores: 12+15+20+21+20+36+15+25+15 = 179 Divide by “n” of scores: 179 / 9 = 19.9 (round to 20). Characteristics: Good for Ratio or interval scales Sensitive to all observed values Highly stable; with larger n is insensitive to subtle changes in values Can be highly sensitive to extreme values (particularly in smaller samples).

    27. The Mean: Statistical notation Psychology 242, Dr. McKirnan Statistics introduction 1

    28. Psychology 242, Dr. McKirnan Statistics introduction 1 Central tendency: Normal Distributions For a normal distribution the mean, mode, and median are all same -- the center of the distribution

    29. Psychology 242, Dr. McKirnan Statistics introduction 1 The distribution of student ages Age is a good example of a variable that is normally distributed

    30. Psychology 242, Dr. McKirnan Statistics introduction 1 Central tendency: Bimodal Distributions A bimodal distribution has two modes, at the outsides of the distribution.

    31. Psychology 242, Dr. McKirnan Statistics introduction 1 Central tendency: Skewed Distributions A skewed distribution has extreme scores in one direction.

    32. Psychology 242, Dr. McKirnan Statistics introduction 1 Central tendency; normal distribution

    33. Local examples of distributions, 1 Psychology 242, Dr. McKirnan Statistics introduction 1

    34. Psychology 242, Dr. McKirnan Statistics introduction 1

    35. Local examples of distributions, 2b Psychology 242, Dr. McKirnan Statistics introduction 1

    36. Psychology 242, Dr. McKirnan Statistics introduction 1 Positive skew example

    37. Example of a strong positive skew: American incomes. Psychology 242, Dr. McKirnan Statistics introduction 1

    38. American income distribution (2000) Psychology 242, Dr. McKirnan Statistics introduction 1

    39. Psychology 242, Dr. McKirnan Statistics introduction 1 Deceptiveness of the M in skewed data: Example from Bush tax cut data Did all Americans benefit equally from the Bush era tax cuts? The M annual tax cut for all tax payers is $1629 per year from 2001 to 2010... a pretty good number.

    40. Psychology 242, Dr. McKirnan Statistics introduction 1 Deceptive Means in skewed data The M for tax cuts is pulled upward by a small number of very high values

    41. Psychology 242, Dr. McKirnan Statistics introduction 1 Tax cuts by income group

    42. Psychology 242, Dr. McKirnan Statistics introduction 1 Mean v. Median in skewed data The difference between the Mode, Median & Mean in skewed data can be crucial: Example from men’s v. women’s number of sex partners.

    43. Psychology 242, Dr. McKirnan Statistics introduction 1 Mean v. Median in skewed data Expressing the data as Ms greatly exaggerates differences between men & women.

    44. Psychology 242, Dr. McKirnan Statistics introduction 1 Bottom line: Most natural processes or variables show a (more or less) normal distribution… Age, Height, IQ, most attitude scales… When a distribution is normal the Mode = Mean = Median, all at the center of the distribution. When a distribution is bimodal or skewed the M can be deceptive. Typically: Mode Median Mean

    45. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    46. Psychology 242, Dr. McKirnan Statistics introduction 1 Range 1. The Range of the highest to the lowest score. Ages of males: 18, 25, 20, 21, 20, 23, 24, 26,18, 25, 20, 19, 19. Ages of women: 26, 27, 27, 31, 32, 28, 31, 29, 30, 27, 26, 37, 28

    47. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard deviation Similar to the “average” amount each score deviates from the M of the sample. “Standardizes” scores to a normal curve, allowing basic statistics to be used. More accurate & detailed than range: A few extremely high or low scores (“outliers”) may make the range inaccurate S assesses the deviation of all scores in the sample from the mean

    48. Standard deviation Psychology 242, Dr. McKirnan Statistics introduction 1

    49. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard Deviations; Deviations of scores from the M

    50. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard Deviation & Formulas

    51. Psychology 242, Dr. McKirnan Statistics introduction 1 Degrees of freedom

    52. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard Deviation & Formulas

    53. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard Deviation & Formulas

    54. Studying Psychology 242 Psychology 242, Dr. McKirnan Statistics introduction 1

    55. Psychology 242, Dr. McKirnan Statistics introduction 1 Using Standard Deviations

    56. Psychology 242, Dr. McKirnan Statistics introduction 1 Calculating the standard deviation; high variance

    57. Psychology 242, Dr. McKirnan Statistics introduction 1 Scores with less variance

    58. Psychology 242, Dr. McKirnan Statistics introduction 1 Calculating the standard deviation; lower variance

    59. Differing variances Psychology 242, Dr. McKirnan Statistics introduction 1

    60. Summary Psychology 242, Dr. McKirnan Statistics introduction 1

    61. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    62. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: How do we characterize one participant’s score on a scale? How do we combine these into a single metric (mathematical description) to characterize a score? Z score:

    63. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard Deviation & Formulas

    64. Psychology 242, Dr. McKirnan Statistics introduction 1 Introduction to normal distribution The normal distribution is a hypothetical distribution of cases in a sample It is segmented into standard deviation units. Each standard deviation unit (Z) has a fixed % of cases We use Z scores & associated % of the normal distribution to make statistical decisions about whether a score might occur by chance.

    65. Standard deviations & distributions, 1 Psychology 242, Dr. McKirnan Statistics introduction 1

    66. Standard deviations & distributions, 2 Psychology 242, Dr. McKirnan Statistics introduction 1

    67. Standard deviations & distributions, 3 Psychology 242, Dr. McKirnan Statistics introduction 1

    68. Standard deviations & distributions, 4 Psychology 242, Dr. McKirnan Statistics introduction 1

    69. Standard deviations & distributions, 4 Psychology 242, Dr. McKirnan Statistics introduction 1

    70. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores Z describes how far a score is above or below the M in standard deviation units rather than raw scores. “Adjusts” the score to be independent of the original scale. Rather than, e.g., inches, IQ points, attitude scores the scale is in terms of universal standard deviation units. Z allows us to use the general properties of the normal distribution to determine how much of the curve a score is above / below.

    71. (Hypothetical) Sampling Distribution Psychology 242, Dr. McKirnan Statistics introduction 1

    72. Psychology 242, Dr. McKirnan Statistics introduction 1 The Normal Distribution

    73. Psychology 242, Dr. McKirnan Statistics introduction 1

    74. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard deviations and distributions

    75. Psychology 242, Dr. McKirnan Statistics introduction 1 Standard deviations and distributions

    76. Raw scores ? Z scores ? % cases Psychology 242, Dr. McKirnan Statistics introduction 1

    77. Psychology 242, Dr. McKirnan Statistics introduction 1

    78. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    79. Psychology 242, Dr. McKirnan Statistics introduction 1 Normal distribution; Z scores To use the normal distribution to calculate how ‘good’ a score is… Calculate how far the score (X) is from the mean (M); X–M. “Adjust” X–M by how much variance there is in the sample; use the standard deviation (S). Z = X–M / S

    80. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: areas under the normal curve, 2

    81. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: areas under the normal curve, 2

    82. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: areas under the normal curve, 2

    83. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: areas under the normal curve, 2

    84. Psychology 242, Dr. McKirnan Statistics introduction 1 Z scores: areas under the normal curve, 2

    85. Psychology 242, Dr. McKirnan Statistics introduction 1 Using the Normal Distribution How good is a score of ‘6' in the group described in Table 1? Table 2?

    86. Psychology 242, Dr. McKirnan Statistics introduction 1 Using the normal distribution, 2

    87. Psychology 242, Dr. McKirnan Statistics introduction 1 Comparing Scores: deviation x Variance

    88. Psychology 242, Dr. McKirnan Statistics introduction 1 Normal distribution; high variance

    89. Psychology 242, Dr. McKirnan Statistics introduction 1 Normal distribution; low variance

    90. Psychology 242, Dr. McKirnan Statistics introduction 1 Evaluating scores using Z

    91. Psychology 242, Dr. McKirnan Statistics introduction 1 Summary: evaluating individual scores A. The distance of the score from the M. In both groups ‘6’ is two units > the M (X = 6, M = 4). B. The variance in the rest of the sample One group has low variance and one has higher. C. Criterion for “significantly good” score What % of the sample must the score be higher than…

    92. Psychology 242, Dr. McKirnan Statistics introduction 1 Z / “standard” scores Z scores (or standard deviation units) standardize scores by putting them on a common scale. In our example the target score and M scores are the same, but come from samples with different variances. We compare the target scores by translating them into Zs, which take into account variance. Any scores can be translated into Z scores for comparison…

    93. Psychology 242, Dr. McKirnan Statistics introduction 1 Cannot directly compare scores because they are on different scales. Change scale to common metric: Z scores (% of larger distribution of scores on that scale). Z scores can be compared, since they are standardized by being relative to the larger population of scores.

    94. Psychology 242, Dr. McKirnan Statistics introduction 1 Comparing Zs

    95. Psychology 242, Dr. McKirnan Research questions, hypotheses & designs

    96. Psychology 242, Dr. McKirnan Statistics introduction 1 Using statistics to test hypotheses:

    97. Psychology 242, Dr. McKirnan Statistics introduction 1 Probabilities & Statistical Hypothesis Testing Using the Normal Distribution: More extreme scores have a lower probability of occurring by chance alone Z (# of standard deviation units from M) = the % of cases above or below the observed score A high Z score may be “extreme” enough for us to reject the null hypothesis

    98. Psychology 242, Dr. McKirnan Statistics introduction 1 “Statistical significance” We assume a score with less than 5% probability of occurring has not occurred by chance alone (i.e., higher or lower than 95% of the other scores… p < .05) Z > +1.98 occurs < 95% of the time (p <.05). If Z > 1.98 we consider the score to be “significantly” different from the mean To test if an effect is “statistically significant” Compute a Z score for the effect Compare it to the critical value for p<.05; + 1.98

    99. Psychology 242, Dr. McKirnan Statistics introduction 1

    100. Psychology 242, Dr. McKirnan Statistics introduction 1 Evaluating Research Questions Data Statistical Question

    101. Psychology 242, Dr. McKirnan Statistics introduction 1 Critical ratio Critical ratio =

    102. Psychology 242, Dr. McKirnan Statistics introduction 1 Critical Ratios and distributions We use Mean(s) and Standard Deviation(s) to compute a critical ratio for a given research result.

    103. Psychology 242, Dr. McKirnan Statistics introduction 1 Examples of Critical Ratios

    104. Psychology 242, Dr. McKirnan Statistics introduction 1 Numbers are important for representing “reality” in science (and other fields). Different measures of central tendency are useful & accurate for different data; Mean is the most common. Median useful for skewed data Mode useful for simple categorical data Variance (around the mean) is key to characterizing a set of numbers. We understand a set of scores in terms of the: Central tendency – the average or Mean score The amount of variance in the scores, typically the Standard Deviation.

    105. Psychology 242, Dr. McKirnan Statistics introduction 1 Z is the prototype critical ratio:

More Related