260 likes | 387 Views
Continuing Saga of Adaptive Testing Principles Applied to Performance and Personality Measurement. Walter C. Borman PDRI, an SHL Company and University of South Florida. Presented to Gateway Industrial-Organizational Psychologists April 3, 2013. Outline.
E N D
Continuing Saga ofAdaptive Testing Principles Applied toPerformance and Personality Measurement Walter C. BormanPDRI, an SHL Companyand University of South Florida Presented to Gateway Industrial-Organizational Psychologists April 3, 2013
Outline • Description of the “adaptive test” concept • Development of computer-adaptive rating scales for measuring job performance • Development of NCAPS and GPI-Adaptive personality inventories • Initial validation results • Faking issue • Conclusions and next steps
Description of the “Adaptive Test” Concept • Initial ability domain application • Our work began in the performance domain • Forced-choice Computer Adaptive Rating Scales (CARS) • Laboratory study showed lower standard error of measurement and higher validity and accuracy for CARS
Canadian Forces Project • Performance rating scales for 8 leadership competencies for officers and CSMs in four rank clusters • Action Orientation and Initiative • Analytical Thinking • Behavioral Flexibility/Change Leadership • Commitment to Military Ethos • Communication • Developing Self and Others • Results Management • Teamwork
Development of Competency Scales • Workshops with officer and CSM groups to generate items • Editing process • Items assigned to organizational level • Retranslation by I/O psychologists at PDRI and CF • Result is four item pools with 484 to 576 items
Development of Competency Scales(continued) • Example items from Analytical Thinking • Consistently provides insightful observations and analyses regarding the organization and solutions to problems (M = 6.12) • Gathers and then analyzes information from a variety of sources to develop effective and timely solutions to problems (M = 5.13) • Finds creative solutions to problems but is unable to translate these to relevant, realistic, and practical recommendations (M = 3.08) • Has considerable trouble analyzing even straightforward problems (M = 1.29)
Development of Competency Scales(continued) • Example items from Action Orientation • Finds appropriate ways of accomplishing almost all tasks through initiative and hard work, and follow-through is typically outstanding (M = 6.23) • In most cases, takes the initiative to complete tasks on or ahead of time (M = 5.12) • Is not proactive in moving toward objectives, but usually achieves mission success by making steady progress(M = 3.52) • Is reactive to situations, slow to respond, and rarely seeks to resolve the larger issues (M = 1.43)
Forced Choice Format: Example Item Pairs Click on the behavior that is more descriptive of the ratee: Is a good role model for others in the CF related to personal conduct and military ethos (5.69) vs. Usually accepts responsibility for own and subordinates’ actions (4.59)
Forced Choice Format: Example Item Pairs Click on the behavior that is more descriptive of the ratee: Takes pride in serving the interests of the organization (5.42) vs. Actively and consistently promotes the vision and values of the CF; maintains exemplary conduct consistent with this vision (6.76)
Forced Choice Format: Example Item Pairs Click on the behavior that is more descriptive of the ratee: Always ensures that subordinates follow policies, regulations, and orders (5.65) vs. Fully embraces the military profession and takes pride in the history, traditions, and values of the CF (6.47)
Development of NCAPS • Idea was to apply CARS concept to non-cognitive testing • 19 potential personality constructs were identified • Psychologists rated their importance for 79 Navy jobs • 10 constructs selected based on means and SDs
Constructs Identified for NCAPS • Achievement • Social Orientation • Stress Tolerance • Adaptability/Flexibility • Attention to Detail • Dependability • Dutifulness/Integrity • Self Reliance • Willingness to Learn • Vigilance
Development of NCAPS(continued) • PDRI generated 1725 items at all trait levels • Construct and trait level “retranslation” was conducted • 1494 items survived, 106-199 per construct
Example Items for Social Orientation • It is easy for me to find something in common with any person I meet (M = 6.36) • It takes real effort for me to hide my impatience with people who aren’t very bright (M = 1.49) • I am able to make friends when I put some effort into it (M = 3.63)
Validation Results • Concurrent validation study designed with 110 first tour Navy Sailors • NCAPS and a conventional personality inventory administered and supervisor performance ratings gathered on nine “Navy-wide” dimensions
Validation Results(continued) • Unit-weighted composite of 10 NCAPS conventional against composite overall performance (r = .13 uncorrected;r = .18 corrected) • Unit-weighted composite of 10 NCAPS adaptive against composite overall performance (r = .27 uncorrected; r = .37 corrected)
Overview of Global Personality Inventory-Adaptive • General assessment of normal adult personality with a focus on workplace applications • Selection, development, classification of employees across levels and industries
Measurement Taxonomy • Targeted a mid-level taxonomy similar in scope to NCAPS and with similar inclusion criteria: • Comprehensiveness and breadth • Unidimensionality • Level of specificity • Expectation of criterion-related and construct validity • Alsoglobal applicability • Review of dominant taxonomies resulted in 13 dimension taxonomy: • Self Development • Flexibility • Collaboration • Thoroughness • Reliability • Sense of Duty • Achievement • Composure • Confidence and Optimism • Independence • Sociability • Influence • Innovation
Statement Identification and Development • Specification of score distribution and statement bank size • 7-point scale • Goal = 150 to 200 statements per trait • Facets specified as an item writing guide and to ensure construct coverage • Similar methodology to NCAPS statement development • Recruited experienced personality item writers • Item writing training • Pilot items reviewed by project team • Item writing assignments by trait; targeted to parts of the trait continuum to result in coverage across trait range
Manager-Level Validation • Concurrent validation study with incumbents in first line leader roles • Consortium of 8 organizations contributed 14 samples • Diverse set of organizations representing insurance, telecommunications, financial services, healthcare, retail industries • N = 1109 supervisors of hourly employees • N = 240 managers of salaried professionals • The same job performance rating form provided consistent performance criteria across samples • 27 performance dimension rating areas • 7 global rating items
Validation Results • Concurrent Validity Study Against Supervisor Performance Ratings (N=1349) • Unit-Weighted Composite Against Overall Performance (r=.28, corrected for criterion unreliability) • Validities Asymptote at 5-8 Item Pairs • Testing Time ½ that of Conventional Inventory
Faking ResearchUnderhill, Lords, & Bearden (2006) • Investigate fake resistance of NCAPS • First study to evaluate the extent to which participants can deliberately elevate their personality scores on NCAPS • NCAPS and non-adaptive/traditionally-formatted versions used • Students (N = 148) • T1: respond honestly • T2: deliberately fake to make the best impression possible for acquiring a job • Differences in personality scores from honest to faking were compared for each instrument • Adaptive NCAPS: no significant mean differences between honest and faking scores • Traditional NCAPS: significant mean differences on all traits measured
Conclusions and Next Steps • Performance rating application shows promise but field study needed: Canadian Forces • Personality measurement application also promising • Modest validity improvement over non-adaptive • Shorter testing time by ~50% • Faking not as serious as feared
Conclusions and Next Steps(continued) • Paired comparison judgment process and iterative IRT-based algorithm with raters or test takers may be an important factor in generating valid information for performance ratings and self-report personality reports • More research comparing the forced choice adaptive format applied to performance ratings and personality testing to non-adaptive formats should proceed