1 / 81

Assessing Personality 75 Years After Likert: Thurstone Was Right!

Assessing Personality 75 Years After Likert: Thurstone Was Right!. (And some implications for I/O). Colleagues. Sasha Chernyshenko Steve Stark. Thurstone.

shanna
Download Presentation

Assessing Personality 75 Years After Likert: Thurstone Was Right!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessing Personality 75 Years After Likert:Thurstone Was Right! (And some implications for I/O)

  2. Colleagues • Sasha Chernyshenko • Steve Stark

  3. Thurstone • In a series of papers in the late 1920s, Thurstone asserted “Attitudes Can Be Measured” and provided several methods for their measurement • He assumed that a conscientious person would endorse a statement that reflected his/her attitude…but • “as a result of imperfections, obscurities, or irrelevancies in the statement, and inaccuracy or carelessness of the subjects” not everyone will endorse a statement, even when it matches their attitude

  4. Thurstone, Psych Review, 1929 • For N1 people with attitude S1, all should endorse a statement with scale value S1 if they were conscientious and the item was perfect; but only n1 actually endorse the item • These people will endorse another statement with scale value S2 with a probability p that is a function of |S1-S2| • Figure from Thurstone’s paper:

  5. Thurstone 1929

  6. Thurstone 1928 Attitudes Can Be Measured • Gave an example of an attitude variable, militarism-pacifism, with six statements representing a range of attitudes:

  7. Thurstone 1928

  8. Thurstone 1928 • A pacifist “would be willing to indorse all or most of the opinions in the range d to e and … he would reject as too extremely pacifistic most of the opinions to the left of d, and would also reject the whole range of militaristic opinions.” • “His attitude would then be indicated by the average or mean of the range that he indorses”

  9. Implications • On Thurstone’s pacificism-militarism scale, three people might endorse two items each: • Person 1 endorses f and d, and is very pacifistic • Person 2 endorses e and b, and is neutral • Person 3 endorses c and a, and is very militaristic • Thus, it is crucial to know which items are endorsed!

  10. Likert 1932 • Proposed a much simpler approach: A five-point response scale with options “Strongly Approve”, “Approve”, “Neutral”, “Disapprove”, and “Strongly Disapprove”. • The numerical values 1 to 5 were assigned to the different response options • And an individual’s score was the sum or mean of the numerical scores

  11. Likert 1932 • Likert evaluated his scales by • Split-half reliability • Item-total correlations • To make this work, he hit upon the idea of reverse scoring, e.g., statements like d and f from Thurstone needed to be scored in the opposite direction of statements like a and c.

  12. Likert 1932 • When computing item-total correlations, “if a zero or very low correlation coefficient is obtained, it indicates that the statement fails to measure that which the rest of the statements measure.” (p. 48) • “Thus item analysis reveals the satisfactoriness of any statement so far as its inclusion in a given attitude scale is concerned”

  13. Likert 1932 • Likert discarded intermediate statements like “Compulsory military training in all countries should be reduced but not eliminated” • Such a statement is “double-barreled and of little value because it does not differentiate persons in terms of their attitudes” (p. 34)

  14. Likert Scaling • Although Likert didn’t articulate a psychometric model for his procedure, his analysis implies what Coombs (1964) called a dominance response process. • Specifically, someone high on the trait or attitude measured by a scale is likely to “Strongly Agree” with a positively worded item and “Strongly Disagree” with a negatively worded item

  15. Person Item Example of a Dominance Process Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item.

  16. Thurstone Scaling • Thurstone assumed people endorse items reflecting attitudes close to their own feelings • Coombs (1964) called this an ideal point process • Sometimes called an unfolding model

  17. Item TooIntroverted TooExtraverted Example of an Ideal Point Process • Person endorses item if his standing on the latent trait is near that of the item. • “I enjoy chatting quietly with a friend at a cafe.” • Disagree either because: Toointroverted (uncomfortable in public places) Tooextraverted (chatting over coffee is boring)

  18. Important Point: • The item-total correlation of intermediate ideal point items will be close to zero!

  19. Which Process is Appropriate for Temperament Assessment? • In a series of studies, we’ve • Examined appropriateness of dominance process by fitting models of increasing complexity to data from two personality inventories • Compared fits of dominance and ideal point models of similar complexity to 16PF data • Compared fits of dominance and ideal point models to sets of items not preselected to fit dominance models

  20. Fitting Traditional Dominance Models to Personality Data • Data • 16PF 5th Edition • 13,059 examinees completed 16 noncognitive scales • Goldberg’s Big Five factor markers • 1,594 examinees completed 5 noncognitive scales • Models examined • Parametric – 2PLM, 3PLM • Nonparametric – Levine’s Maximum Likelihood Formula Scoring (MFSM)

  21. Three-Parameter Logistic Model

  22. Three-Parameter Logistic Model

  23. Three-Parameter Logistic Model

  24. Three-Parameter Logistic Model

  25. Three-Parameter Logistic Model

  26. Three-Parameter Logistic Model

  27. Two-Parameter Logistic Model

  28. Methods for Assessing Fit: Fit Plots

  29. Methods for Assessing Fit: Chi-Squares • Chi-squares typically computed for single items • Very important to examine item pairs and triplets • May indicate violations of local independence or misspecified model

  30. Methods for Assessing Fit: Chi-Squares To aid interpretation of chi-squares: • Adjust to sample size of 3,000 • Compare groups of different size • The expected value of a non-central chi-square is equal to its df plus N times the noncentrality parameter d • where N is the sample size. So an estimate of the noncentrality parameter is

  31. Adjusted Chi-square • To adjust to a sample size of, say, 250, use • For IRT, we usually adjust to N = 3000, and divide by the df to get an adjusted chi-square/df ratio • Less than 2 is great, less than 3 is OK

  32. AdjChf < 3 Adjusted Chi-square/df for an Ability Test

  33. Results for 16 PF Sensitivity Scale: Mean Chi-sq/df Ratios

  34. What if Items Assessed Trait Values Along the Whole Continuum? • Items on existing personality scales have been pre-screened on item-total correlation • We speculate that items measuring intermediate trait values are systematically deleted • So, what happens if a scale includes some intermediate items?

  35. TAPAS Well-being Scale • Tailored Adaptive Personality Assessment System • Assesses up to 22 facets of the Big Five • Well-being is a facet of emotional stability • We wrote items reflecting low, moderate, and high well-being

  36. For example, TAPAS Well-Being Scale • WELL04, “I don’t have as many happy moments in my life as others have • WELL17, “My life has had about an equal share of ups and downs • WELL41, “Most days I feel extremely good about myself • In total, 20 items. 5 negative items, 9 positive, and 6 neutral

  37. Traditional Analysis Results

  38. Fit Plot for 2PL WELL17

  39. An Ideal Point Model: The Generalized Graded Unfolding Model (GGUM) • Roberts, Donoghue, & Laughlin (2000). Applied Psychological Measurement. • The model assumes that the probability of endorsement is higher the closer the item to the person • GGUM software provides maximum likelihood estimates of item parameters

  40. GGUM • The probability of disagree is: and the probability of agree is

  41. GGUM Estimated IRF for Moderate Item IRF for Agree response to TAPAS Well-being item “My life has had about an equal share of ups and downs.”

  42. TAPAS Well-being Scale 2PL Results: GGUM Results:

  43. Summary of Findings • 2PLM and 3PLM fit scales developed by traditional methods OK, but if moderate items are included • Chi-square doublets and triplets can be large, especially when moderate items are included • Discrimination parameter estimates are uniformly small for moderate items (and item-total correlations are near zero). • GGUM fits all items, including moderate items • Adj. chi-square to df ratios are small for doubles and triples • GGUM discrimination parameter estimates are large for the moderate items!

  44. So, for Well-Being • Fitting a dominance item response theory model (the 2-parameter logistic) produced an adjusted Chi-Square to df ratio of 2.955 for pairs • The ideal point model yielded an adjusted Chi-square/df ratio of 0.997 for pairs

  45. Conclusion • Ideal point model seems more appropriate for temperament assessment • BUT there’s a “Fly in the ointment” for I/O • Correct specification of response process does not guarantee more accurate assessment, because … • Traditional items are easily FAKED

  46. Examples of “Traditional” Itemsthat are Easily Faked In each case, the positively keyed response is obvious. • I get along well with others. (A+) • I try to be the best at everything I do. (C+) • I insult people. (A-) • My peers call me “absent minded.” (C-) Because these items consist of individual statements, theyare commonly referred to as “single stimulus” items.

  47. Army Assessment of Individual Motivation (AIM) • Uses tetrads: • I get along well with others. (A+) • I set very high standards for myself. (C+) • I worry a lot. (ES-) • I like to sit on the couch and eat potato chips. (Physical condition-) • Respondent picks the statement that is Most Like Me and the statement that is Least Like Me • Army AIM has shown less score inflation • What psychometric model would describe this type of data????

  48. So… • US Army researchers Len White and Mark Young (and others) found some fake resistance and criterion-related validity for the tetrad format • But modeling four-dimensional items was too hard for me! • How about two-dimensional items?

  49. Multidimensional Pairwise Preference (MDPP) Format • Create items by pairing stimuli that are similar in desirability, but representing different dimensions • “Which is more like you?” • I get along well with others. (A+) • I always get my work done on time. (C+) • This led to my work on personality assessment over the past 10 years • And the result is:

  50. Tailored Adaptive Personality Assessment System (TAPAS) • TAPAS is designed to overcome existing limitations of personality assessment for selection by incorporating recent advancements in: • Temperament/personality assessment • Item response theory (IRT) • Computerized adaptive testing (CAT) • Our goal is for TAPAS to be innovative in both how we assess (IRT, CAT) and what we assess (facets of personality)

More Related