Validating Assessment Centers

Validating Assessment Centers Kevin R. Murphy Department of Psychology Pennsylvania State University, USA

A Prototypic AC • Groups of candidates participate in multiple exercises • Each exercise designed to measure some set of behavioral dimensions or competencies • Performance/behavior in exercises is evaluated by sets of assessors • Information from multiple assessors is integrated to yield a range of scores

Common But Deficient Validation Strategies • Criterion-related validity studies • Correlate OAR with criterion measures • e.g., OAR correlates .40 with performance measures, but written ability tests do considerably better (.50’s) • There may be practical constraints to using tests, but psychometric purists are not concerned with the practical

Common But Deficient Validation Strategies • Construct Validity Studies • Convergent and Discriminant Validity assessments • AC scores often show relatively strong exercise effects and relatively weak dimension/competency effects • This is probably not the right model for assessing construct validity, but it is the one that has dominated much of the literature

Common But Deficient Validation Strategies • Content validation • Map competencies/behavioral descriptions onto the job • If competencies measures by AC show reasonable similarity to job competencies, content validity is established • Track record for ACs is nearly perfect because job information is used to select competencies, but evidence that competencies are actually measured is often scant

Ask the Wrong Question, Get the Wrong Answer • Too many studies ask “Are Assessment Centers Valid?” • The Question should be “Valid for What?” • That is, validity is not determined by the measurement procedure or even by the data that arises from that procedure. Validity is determined by what you attempt to do with the data

Sources of Validity Information • Validity for what? • Determine the ways you will use the data coming out of an AC. ACs are not valid or invalid in general, they are valid for specific purposes • Cast a wide net! • Virtually everything you do that gives you insight into what the data coming out of an AC mean can be thought of as part of the validation process

Sources of Validity Information • Raters • Rater training, expertise, agreement • Exercises • What behaviors are elicited, what situational factors affect behaviors • Dimensions • Is there evidence to map from AC behavior to dimensions to job

Sources of Validity Information • Scores • Wide range of assessments of the relationships among the different scores obtained in the AC process provide validity information • Processes • Evidence that the processes used in an AC tend to produce reliable and relevant data is part of the assessment of validity

Let’s Validate An Assessment Center! • Design the AC • Identify the data that come out of an AC • Determine how you want to use that data • Collect and evaluate information relevant to those uses • Data from pilot tests • Analysis of AC outcome data • Evaluations of AC components and process • Lit reviews, theory and experience

Design • Job - Entry-level Human Resource Manager • Competencies • Active Listening • Speaking • Management of Personnel Resources • Social Perceptiveness • Being aware of others' reactions and understanding why they react as they do. • Coordination • Adjusting actions in relation to others' actions. • Critical Thinking • Reading Comprehension • Judgment and Decision Making • Negotiation • Complex Problem Solving

Design Populate the Matrix – which competencies and what exercises?

Assessors • How many, what type, which exercises?

Assessment Data • Individual behavior ratings? • How will we set these up so that we can assess their accuracy or consistency? • Individual competency ratings? • How will we set these up so that we can assess their accuracy or consistency? • Pooled ratings • What level of analysis? • OAR

Uses of Assessment Data • Competency • Is it important to differentiate strengths and weaknesses? • Exercise • Is AC working as expected (exercise effect might or might not be confounds) • OAR • Do you care how people did overall? • Other • Process tracing for integration. Is it important how ratings change in this process?

Validation • The key question in all validation efforts is whether the inferences you want to draw from the data can be supported or justified • A question that often underlies this assessment involves determining whether the data are sufficiently credible to support any particular use

Approaches to Validation • Assessment of the Design • Competency mapping • Do exercises engage the right competencies • Are competency demonstrations in AC likely to generalize • Are these the right competencies? • Can assessors discriminate competencies? • Are the assessors any good? • Do we know how good they are

Approaches to Validation • Assessment of the Data • Inter-rater agreement • Distributional assessments • Reliability and Generalizability analysis • Internal structure • External correlates

Approaches to Validation • Assessment of the Process • Did assessors have opportunities to observe relevant behaviors? • What is the quality of the behavioral information that was collected? • How were behaviors translated into evaluations • How were observations and evaluations integrated

Approaches to Validation • Assessment of the Track Record • Relevant theory and literature • Relevant experience with similar ACs • Outcomes with dissimilar ACs

Assessment of the Design:Competencies • Competency Mapping (content validation) • Do exercises elicit behaviors that illustrate the competency • Are we measuring the right competencies? • Evidence that exercises reliably elicit the competencies • Generalizability from AC to world of work

Assessment of the Design:Assessor Characteristics • Training and expertise • What do we know about their performance as assessors • One piece of evidence for validity might be information that will allow us to evaluate the performance or the likely performance of our assessors

Assessment of the Data • Distributional assessments • Does the distribution of scores make sense • Is the calibration of assessors reasonable given the population being assessed • Is the variability in scores?

Assessment of the Data • Reliability and Generalizability analyses • Distinction between reliability and validity is not as fundamental as most people believe • Assessments of reliability are an important part of validation • The natural structure of AC data fits nicely with generalizability theory

Assessment of the Data • Generalizability • AC data can be classified according to a number of factors – rater, ratee, competency, exercise • ANOVA is the stating point for generalizability analysis – i.e., identifying the major sources of variability • Complexity of ANOVA design depends largely on whether the same assessors evaluate all competencies and exercises or some

Assessment of the Data • Generalizability – an example • Use ANOVA to examine the variability of scores as a function of • Candidates • Dimensions (Competencies) • Exercises (potential source of irrelevant variance) • Assessors

Assessment of the Data

Assessment of the Data • Internal Structure • Early in the design phase, articulate your expectations regarding the relationships among competencies and dimensions • This articulation becomes the foundation for subsequent assessments • It is impossible to tell if the correlations among ratings of competencies are too high or too low unless you have some idea of the target you are shooting for

Assessment of the Data • Internal Structure • Confirmatory factor analysis is much better than exploratory for making sense of the internal structure • Exercise effects are not necessarily a bad thing. No matter how good assessors are, they cannot ignore overall performance levels • Halo is not necessarily an error, it is part of the judgment process all assessors use

Assessment of the Data • Confirmatory Factor Models • Exercise only • Does this model provide a reasonable fit? • Competency • Does this model provide a reasonable fit? • Competency + exercise • How much better is the fit when you include both sets of factors?

Assessment of the Data • External Correlates • The determination of external correlates depends strongly on • The constructs/competencies you are trying to measure • the intended uses of the data

Assessment of the Data • External Correlates • Alternate measures of competencies • Measures of the likely outcomes and correlates of these competencies

Competencies • Active Listening • Speaking • Management of Personnel Resources • Social Perceptiveness • Coordination • Critical Thinking • Reading Comprehension • Judgment and Decision Making • Negotiation • Complex Problem Solving

Alternative Measures Critical Thinking , Reading Comprehension Standardized tests Judgment and Decision Making Supervisory ratings, Situational Judgment Tests

Possible Correlates • Active Listening • Success in coaching assignments • Sought as mentor • Speaking • Asked to serve as spokesman, public speaker • Negotiation • Success in bargaining for scarce resources

Assessments of the Process • Opportunities to observe • Frequency with which target behaviors are recorded • Quality of the information that is recorded • Detail and consistency • Influenced by format – e.g., narrative vs. checklist

Assessments of the Process • Observations to evaluations • How is this done? • Consistent across assessors? • Integration • Clinical vs. statistical • Statistical integration should always be present but should not necessarily trump consensus • Process by which consensus moves away from statistical summary should be transparent and documented

Assessment of the Track Record • The history of similar ACs forms part of the relevant research record • The history of dissimilar ACs is also relevant

The Purpose-Driven AC • What are you trying to accomplish with this AC? • Is there evidence this AC or ones like it have accomplished or will accomplish this thing? • Suppose the AC is intended principally to serve as part of leadership development. Identifying this principal purpose helps to identify relevant criteria

Criteria • Advancement • Leader success • Follower satisfaction • Org success in dealing with turbulent environments • The process of identifying criteria is largely one of thinking through what the people and the organization would be like if you AC worked

An AC Validation Report • Think of validating an AC the same way a pilot does his or her pre-flight checklist • The more you know about each of the items on the checklist, the more compelling the evidence that the AC is valid for its intended purpose

AC Validity Checklist • Do you know (and how do you know) whether: • The exercises elicit behaviors that are relevant to the competencies you are trying to measure • These AC demonstrations of competency are likely to generalize

AC Validity Checklist • Do you know (and how do you know) whether: • Raters have the skill, training, expertise needed? • Raters agree in their observations and evaluations • Their resolutions of disagreements make sense

AC Validity Checklist • Do you know (and how do you know) whether: • Score distributions make sense • Are there differences in scores received by candidates? • Can you distinguish strengths from weaknesses

AC Validity Checklist • Do you know (and how do you know) whether: • Analyses of Candidate X Dimensions X Assessors yields sensible outcomes • Assessor – are assessors calibrated? • C X D – do candidates show patterns of strength and weakness? • A X D – do assessors agree about dimensions • C X D X A – do assessors agree about evaluations of patterns of strengths and weaknesses

AC Validity Checklist • Do you know (and how do you know) whether: • Factor structure makes sense, given what you are trying to measure • Do you know anything about the relationships among competencies? • Is this reflected in the sorts of factor models that fit?

AC Validity Checklist • Do you know (and how do you know) whether: • Competency scores related to • Alternate measures of these competencies • Likely outcomes and correlates of these competencies

AC Validity Checklist • Do you know (and how do you know) whether: • There are Competency X Treatment Interactions • Identifying individual strengths and weaknesses of most useful when different patterns will lead to different treatments (training programs, development opportunities) and when making the right treatment decision for each individual leads to better outcomes than treating everyone the same

AC Validity Checklist • Do you know (and how do you know) whether: • The process supports good measurement • Do assessors have opportunities to observe relevant behaviors? • Do they record the right sort of information? • Is there a sensible process for getting from behavior observation to competency judgment?

AC Validity Checklist • Do you know (and how do you know) whether: • The integration process helps or hurts • How is integration done? • Is it the right method given the purpose of the AC? • How much does the integration process change the outcomes?

Validating Assessment Centers

Validating Assessment Centers

Presentation Transcript

Mathematical Knowledge for Teaching Algebra: Validating an Assessment of Teacher Knowledge

Validating A Modern Microprocessor

Validating EMR Audit Automation

Validating teaching and assessment strategies

Validating Requirements

Validating your data

From Validating Models to Validating Systems

Validating CDA

Validating Library Usage Interactively

Validating Formal Models

Validating uncertain predictions

Developing and Validating an Assessment Measure

ASP.NET Validating user input

Validating High-Level Synthesis

Validating SDTM

Validating Counterparty Risk Models

Assessment Centers and Advanced Selection Techniques

Validating Excel-based Spreadsheets

Validating OSF in FY09

Beyond Peer Review: Developing and Validating 21st-Century Assessment Systems