620 likes | 757 Views
Individual Differences in the Ability to Judge Others Accurately. David A. Kenny University of Connecticut http://davidakenny.net/kenny.htm. Overview. Review of previous literature Reliability Internal consistency Cross-target correlations Parallel forms New model: SCARIB.
E N D
Individual Differences in the Ability to Judge Others Accurately David A. Kenny University of Connecticuthttp://davidakenny.net/kenny.htm
Overview • Review of previous literature • Reliability • Internal consistency • Cross-target correlations • Parallel forms • New model: SCARIB
Accuracy About What? • the target’s personality • Is Dave friendly? • target’s opinions or attitudes • How does Dave feel about Lucy? • what the target is currently thinking or feeling • What is Dave thinking about now? • the target’s mood • Is Dave excited or bored?
What Is Accuracy? Correspondence between a judgement and a criterion measure
Interest in Emotional Intelligence (EQ) Models that Provide a Framework for Understanding Judge Moderators Neurological Deficits Creating Lower JA A Renewed Interest in Individual Differences
Types of Measures • Standardized Scales (fixed targets) • PONS • IPT • CARAT • Sternberg measures • Agreement Across Targets • empathic accuracy (EA) • slide viewing
Standardized Scales • Develop a pool of items • Pick the “good” items • Establish reliability as measured by internal consistency
Maybe an IIC of .03 Is Not All that Bad? • Peabody Picture Vocabulary Test: .08 • Beck Depression: .30 • Bem M/F Scale: .19 • Rosenberg Self-Esteem: .34 I guess it is bad.
Agreement Across Targets • Same procedure, but different targets. • example of slide viewing • Treat target as an “item” to assess reliability.
Statistical Analysis of Multiple Target Data • Social Relations Model • Two-way data structure: Judge by Target • Three sources of variance • Judge • Target • Error and Relationship • Judge/(Judge + Error) is like an IIC.
Social Relations Model Variance Partitioning: Emotion Recognition
Social Relations Model Variance Partitioning: Empathic Accuracy
Questions About EA Results • Ickes et al. • Many of the studies show very small amounts of judge variance • 2 of the 3 studies that show the greatest level have only 3 targets, 2 of which are very similar • Thomas & Fletcher • Ad hoc analysis • Possible nonindependence • Perhaps individual differences emerge with emotionally-charged stimuli?
What Do We Learn? • Small judge variance ≈ .10 • Large target variance ≈ .30 • Large error/relationship var. ≈ .60
Convergent Validity? • Do different tests of judgemental ability correlate?
Summary of Convergent Validity • Average correlation of about .10. • Perhaps there are many skills? • The different skills do not correlate highly.
Validity of JA? Recent Meta-analysis by Hall, Andrzejewski, and Yopchick (2008) • gender differences (Hall: r ≈ .20) • positive personality (r ≈ .08) • negative personality (r ≈ -.07) • social competence • self rated (r ≈ .10) • other rated (r ≈ .07)
Are There Individual Differences? • maybe not • low internal consistency • standardized scales • cross-target studies (mostly) • poor convergent validity
Maybe yes? • intuition • validity data hints at some validity • “Is JA the only skill or competence without any individual differences?” • That is, if people are scoring above chance, would not we expect individual differences?
An Item Response Theory Model • presume each question refers to a different item • parameters • r is ability (normally distributed variable) minus difficulty • g is guessing (assuming two alternatives)
Model • probability that the judge is correct: • er/(1 + er) • (e approximately equals 2.718) • allow for guessing • er/(1 + er) + g[1 − (er/(1 + er)]
Average Item Difficulty • probability that judges are correct across all items • allow for guessing • What is the ideal average item difficulty? • 75%? • results from a simulation that varies average item difficulty…
Interpretation • Curves peak in the high 80s • Predicted by IRT (high .80s) • Better to design “easy” tests • Why? • Performance of low ability judges is almost entirely due to chance. If you want to discriminate low ability judges, you need an easy test.
Limits of the Standard IRT Model • Guessing assumed to be random • Cannot score below chance • Unidimensional
SCARIB Model • Skewed • Channels • Attunement • Reversal • Information • Biased Guessing
Channels • Different sources of information • Face • Body • Voice • Different variables • Negative emotion • Positive emotion
Attunement • Judgement is quite difficult: Many channels of information that must be monitored. • A given judge generally allocates her or his attention in the same way. • Metaphor of a radio: “tuned into” some channels more than others • Different judges more attuned to different channels.
Skewed • Total attunement represents the total resources that a judge can allocate to the task. • The distribution of total resources is negatively skewed. • Most judges have many resources. • A few judges have very few resources. • Total resources represents the “true score.”
Information • For each channel of each item, there is information available. • For a given test, there may be more information in some channels than in others.
Reversal • Very often the information is counter-diagnostic. • For example: Someone who is smiling may be unhappy.
Biased Guessing • Assume two response alternatives (e.g., happy and sad) • Some judges are biased in favor of one alternative and some in favor of the other.
Formal Model for Judge i, Item j, and Channel k • Resources: si negatively skewed ranging from 0 to 10 • Attunement: rik = (1 – a)si/c + adiksior the allocation for judge i to channel k (Sdik = 1) • Information: xik = |zik|ssIsCsIC • Reversal: Some information is given a negative sign: xik –xik • gij = whij + (1 – w)/a where w is the amount of biased guessing and hij is the direction (either 1 or 0)
IRT Equations for the Probability of Being Correct • Diagnostic Information • vijk = S(rikxjk) – 1.5(c + 1) • ev/(1 + ev) + g[1 − (ev/(1 + ev)] • Counter-Diagnostic Information • vijk = –S(rikxjk) – 1.5(c + 1) • g[1 − (ev/(1 + ev)]
Simulation • 24 items • 7 channels • attunement • reversal • item biases • biased guessing
Results • SCARIB appears to be able to reproduce the basic results from JA studies. • Also results agree with IRT and prior studies that the mean and alpha are positively correlated (r = .817)
Why Low Internal Consistency? • Multiple channels • Information that varies by item or by item X channel • Biased guessing • However, attunement in conjunction with information varying by channel increases internal consistency.
Validity and Cross-Target Correlation • Lowered by attunement in conjunction with information varying by channel. • Slightly increased by biased guessing. • Cross-target correlation mirrors validity (r = .929) much better than does internal consistency (r = .770).
Why Target Variance? • More information for some targets. • “Better” information (i.e., fewer reversals) for some targets. • Stereotype accuracy: Some targets conform more to item biases. • Target differences are largely due to information differences, not to “readability.”
Why Below Chance Responding? • Reversal • Item Biases • Reliability and validity can be improved by reversing some items when below-chance responding is due to reversal: Being wrong for the right reason. Reversal is counter productive when due to item biases.
One Major Limitation • Ignores policy differences: You could be attuned to diagnostic information but use it the wrong way. • Note though without allowing for policy differences, SCARIB does a good job reproducing JA results.
Implications • JA tests should be “easy.” • Establish individual differences for deception. • The cross-target correlation is a better way of validating a test than internal consistency. • May, at times, be beneficial to use “consensual” criteria.
Final Point • Needed are experiments and statistical analyses to better estimate the SCARIB parameters.
Relationship to the Funder’s RAM Model • Relevance: Is the information correlated with the correct answer (few reversals)? • Availability: Does that information vary (|z|ssCsIsCI)? • Detection: Is the judge attuned to that information (rik)? • Utilization: Does the judge know how to weight the information (oijk)?