310 likes | 463 Views
Using tests to predict job performance CALSWEC - 2007. Goals:Define high-stakes testingReview technical complexities and professional standardsDiscuss the risks, rewards
E N D
1. Using tests to predict job performanceCALSWEC - 2007 Cynthia Parry, PhD
C.F. Parry Associates
Michelle Graef, PhD
Center on Children, Families & the Law – University of Nebraska, Lincoln
Todd Franke, PhD
Department of Social Work – University of California at Los Angeles
Henry Ilian, PhD
New York City Administration for Children’s Services
James Satterwhite Academy for Child Welfare Training
2. Using tests to predict job performanceCALSWEC - 2007 Goals:
Define high-stakes testing
Review technical complexities and professional standards
Discuss the risks, rewards & consequences of high-stakes testing
Anticipated Outcome
A phone call to one or more of the presenters prior any venture in high-stakes testing development
3. Using tests to predict job performanceCALSWEC - 2007 Presentation in 4 parts:
1. Overview of high stakes testing
2. Validity considerations
3. Validity issues in high stakes testing and training of child welfare workers
4. The ethics of using high-stakes testing and the ethics of not using high stakes testing
4. Part 1 of 4:Overview of High Stakes Testing Cindy Parry, PhD
5. What do we mean by high stakes? Any use of a test that might affect
Your ability to get or keep a job
Your chances for promotion or raises
Your supervisor’s evaluation of your work
Testing out of training requirements or needing further coaching or other interventions
Such as
Licensing exams
Civil service hiring tests
Certification/credentialing tests
Any testing where individual results are reported to a supervisor/manager or used to predict future competent performance
6. What is the Law and Policy Context? 14th Amendment Requirements
Equal protection
Due process
Civil Rights Acts
Disparate treatment of protected group
Disparate impact on a protected group
Business necessity
ADA
Accommodations for test takers
7. Considerations in Developing a High Stakes Test Use professional standards to guide test development
The Standards for Educational and Psychological Testing (AERA, APA, NCME)
Uniform Guidelines on Employee Selection Procedures (EEOC)
Principles for the Validation and Use of Personnel Selection Procedures (SIOP)
8. Key Considerations in Validation Validity is the most fundamental consideration in developing any test
We validate the use of a test score for a specific purpose — not a test or an item
Content validity is the most common type of evidence cited in child welfare applications
Requires full specification of the domain or construct it is intended to measure (1.6)
Standards require complete description of procedures used to develop test content (1.6)
Use of subscales requires validation of the inferences made from subscale scores as well as overall score (1.12)
9. Other Considerations for High Stakes Testing Adequate reliability for overall scores and any sub-scores that will be used in decision making
Investigations of differential item functioning and test fairness
Comparability of alternate forms or tests given over time
Documentation of the method of setting the cut score and the qualifications of judges
Notice of testing and due process procedures
Policy and procedures for accommodating persons with disabilities
Policy and procedures for test security and identification of test takers
10. Part 2 of 4:Validity Considerations
Michelle Graef, PhD
11. Can we use our training evaluation knowledge test to make decisions about candidates’ ability to perform the job? Three points to consider:
Used alone, a written knowledge test is unlikely to be adequate to make inferences about the full range of KSAOs needed to perform child welfare work.
The target of your inferences indicates the appropriate content domain to be sampled. The typical training evaluation knowledge test (“post-test”) items have been sampled from the curriculum domain, which may not accurately represent the job domain.
Even if you have strong evidence of content validity, evidence derived from other validation strategies (criterion-related validity) is recommended to support an employment decision. The implied inference is prediction of future job performance. Validity= the pattern of evidence supporting a proposed inference to be drawn from a test.
Validity is the most important consideration in test development and use.
3 points to consider regarding validity…Validity= the pattern of evidence supporting a proposed inference to be drawn from a test.
Validity is the most important consideration in test development and use.
3 points to consider regarding validity…
12. POINT #1) Used alone, a written knowledge test is unlikely to be adequate to make inferences about the range of KSAOs needed to perform child welfare work
Job Analysis is a necessary first step when doing employment testing. Determine the human attributes needed to perform the job tasks. Ensures that your test, criterion measures, and curriculum are based on what people really need to know, skills, etc. to do the work! AND, in the event of a legal challenge, the adequacy of your job analysis process will be scrutinized. Don’t leave home without it.
Note the mix of KSAOs…..what proportion of them are Knowledge?? In Nebraska, our JA for CPS positions found: % knowledge, % skills, % other
If you are making inferences about this job domain using only a knowledge test, what can we say about the accuracy of those inferences??
POINT #1) Used alone, a written knowledge test is unlikely to be adequate to make inferences about the range of KSAOs needed to perform child welfare work
Job Analysis is a necessary first step when doing employment testing. Determine the human attributes needed to perform the job tasks. Ensures that your test, criterion measures, and curriculum are based on what people really need to know, skills, etc. to do the work! AND, in the event of a legal challenge, the adequacy of your job analysis process will be scrutinized. Don’t leave home without it.
Note the mix of KSAOs…..what proportion of them are Knowledge?? In Nebraska, our JA for CPS positions found: % knowledge, % skills, % other
If you are making inferences about this job domain using only a knowledge test, what can we say about the accuracy of those inferences??
13. POINT #2: The target of your inferences indicates the appropriate content domain to be sampled. The typical (post) training evaluation knowledge test items have been sampled from the curriculum content domain, which may not accurately represent the job domain.
These next 3 slides hopefully illustrate the effects of this problem.POINT #2: The target of your inferences indicates the appropriate content domain to be sampled. The typical (post) training evaluation knowledge test items have been sampled from the curriculum content domain, which may not accurately represent the job domain.
These next 3 slides hopefully illustrate the effects of this problem.
14. Given that:
the child protective services job domain goes beyond knowledge to include numerous skills and other characteristics, and
by design, the typical training evaluation knowledge post-test is developed by sampling from a curriculum domain that may not necessarily closely represent the job domain
Then:
even the best training evaluation knowledge test will be deficient for making inferences about candidates’ true ability to perform the job
Predictor contamination
Given that:
the child protective services job domain goes beyond knowledge to include numerous skills and other characteristics, and
by design, the typical training evaluation knowledge post-test is developed by sampling from a curriculum domain that may not necessarily closely represent the job domain
Then:
even the best training evaluation knowledge test will be deficient for making inferences about candidates’ true ability to perform the job
Predictor contamination
16. POINT #3: even if you have strong evidence of content validity, evidence derived from other validation strategies (criterion-related) is recommended to support an employment decision.
POINT #3: even if you have strong evidence of content validity, evidence derived from other validation strategies (criterion-related) is recommended to support an employment decision.
17. Criterion-related validation study Demonstrate relationship between predictor and criterion using statistical significance testing
Feasibility depends upon:
Availability of appropriate criterion measures
Representativeness of research sample
Adequacy of statistical power
Variety of designs:
Predictive
Concurrent
Use of incumbents or job applicants
Development of predictor and criterion measures that are relevant, uncontaminated, not deficient, free from bias, and demonstrate reliability
Refer to Principles for the Validation and Use of Personnel Selection Procedures (2003)
Criterion= work behavior or outcomesCriterion= work behavior or outcomes
18. Recommendations Strategies:
1) retrofit: map your existing knowledge test to the job domain and supplement it with other assessments (e.g., skills) to more adequately represent the job domain
2) start over: develop a test specifically for this purpose that directly samples the job domain, such as a work sample test, simulation, or assessment center
3) gather evidence of criterion-related validity in support of the use of your existing training evaluation knowledge test for employment decisions
Not necessarily suggesting that all of these strategies are equally attractive, but situational constraints will dictate.
Note: feasibility of criterion-related validity study: contrary to what I’ve heard mentioned by some folks in past years ? it is NOT impossible to conduct. You do the study with your existing test while you refrain from using it for any decisions. So, everyone takes the test and passes on to get caseload assigned to them. Then collect job performance measures for them after some time on the job. NE example we’ve discussed in previous symposium.
Not necessarily suggesting that all of these strategies are equally attractive, but situational constraints will dictate.
Note: feasibility of criterion-related validity study: contrary to what I’ve heard mentioned by some folks in past years ? it is NOT impossible to conduct. You do the study with your existing test while you refrain from using it for any decisions. So, everyone takes the test and passes on to get caseload assigned to them. Then collect job performance measures for them after some time on the job. NE example we’ve discussed in previous symposium.
19. Part 3 of 4:Validity issues in high stakes testing and training of child welfare workers
Todd Franke, PhD
20. Validity issues in high stakes testing and training of child welfare workers Are we measuring what is important?
21. Los Angeles Context Evidence based practice
Evidence based training
High stakes in Los Angeles
Minimally competent CPS workers
What would one look like?
22. Translation validity -- Focuses on whether the operationalization (i.e., measure) is a good translation of the construct
Face Validity -- On its face, does the operationalization look like a good translation of the construct?
Content Validity -- Operationalization is checked against the relevant content domain for the construct
Criterion-related Validity
Predictive Validity -- Operationalization’s ability to predict something it should theoretically be able to predict (L3)
Concurrent Validity
Convergent Validity
Discriminant Validity Construct Validity
23. So what constitutes the domains relevant in training child welfare workers?What should be trained/assessed?It depends who you ask! Knowledge of
Policies ( over 384 policies in LA-DCFS)
Procedures
submitting mileage
cellular phone reimbursement
use of logs (court calendars, visitation)
new case to-do’s
cubicle organization and systems
24. Examples of current topics Appreciating multiculturalism
Family preservation/Alternative response
Adoption assessments
Home assessments
Team Decision Making
Going to court/testifying
Court report writing
SDM tools
Basic interviewing
25. What else might be important?Suggestions from supervisors/trainees Readiness to learn
Ability to multi-task
Willingness to accept supervision
Ability to make hard decisions
Ability to transfer knowledge to field
Social support
Problem solving
26. What’s next? How should an assessment be used?
Is knowledge only one of the domains that define the construct?
Does it have any predictive validity?
Is it a valid measure of future job performance?
27. Part 4 of 4:The Ethics of Using High-Stakes Testing and the Ethics of Not Using High Stakes Testing Henry Ilian, PhD I am making a series of assertions. Some of them have not been demonstrated. Some have been demonstrated somewhere. Some of them can be demonstrated reasonably easily, but may not have been. Some may take major research efforts to demonstrate. I am saying if these assertions are true, these are the consequences. In making these assertions, therefore, I am also proposing a research agenda. In the discussion, I challenge people to think of what it would take to confirm or disconfirm these assertions.I am making a series of assertions. Some of them have not been demonstrated. Some have been demonstrated somewhere. Some of them can be demonstrated reasonably easily, but may not have been. Some may take major research efforts to demonstrate. I am saying if these assertions are true, these are the consequences. In making these assertions, therefore, I am also proposing a research agenda. In the discussion, I challenge people to think of what it would take to confirm or disconfirm these assertions.
28. High-stakes Tests Accomplish Three Things Screen out trainees who cannot demonstrate a specified level of mastery
Compel Studying
Enforce Fidelity to the Curriculum
29. A Comparison from NYC
30. Using and Not Using High-Stakes Tests Each Has Consequences Using High-Stakes Tests
The requirements for a professionally developed testing program require significant agency resources
Some people with the potential to be good child protective workers may not pass
Not Using High-Stakes Tests
Risk to families, children, self and colleagues
Difficulty maintaining employee standards and agency morale Tests must be designed to a high level of reliability and validity. They must not discriminate against protected groups
Holistic measures are better than tests alone
Many people have left other jobs to become CPS workers. If they don’t pass, they not only lose a job, they lose everything that accompanies job loss: loss of income, self esteem and social contact
Tests must be designed to a high level of reliability and validity. They must not discriminate against protected groups
Holistic measures are better than tests alone
Many people have left other jobs to become CPS workers. If they don’t pass, they not only lose a job, they lose everything that accompanies job loss: loss of income, self esteem and social contact
31. The Major Ethical Issue is the Potential for Harm There is a need to balance potential harm
to caseworker trainees
to children and families served by CPS agencies
to co-workers and the agencies themselves
32. Concerns for the Evaluator The organization may be reluctant/unable to make available the time and resources to develop and validate instruments
Testing decisions are often made in a political or administrative context
Administratively imposed testing may not meet professional standards