520 likes | 675 Views
Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won. Expect the Unexpected: Are We Clearly Prepared?. Nikki Shepherd Eatchel, M.A. Robin Rome, Esq. Vice President, Test Development Vice President, Legal and Contracts
E N D
Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won Expect the Unexpected: Are We Clearly Prepared? Nikki Shepherd Eatchel, M.A. Robin Rome, Esq. Vice President, Test Development Vice President, Legal and Contracts Thomson Prometric Thomson Prometric Council on Licensure, Enforcement and Regulation 2006 Annual Conference Alexandria, Virginia
Physical Ability Testing and Practical Exams Goals for today’s presentation: • Outline the major risk factors for physical ability and practical examinations • Recommend specific developmental activities and other measures that will help withstand a legal challenge • Provide recommendations for evaluating exams developed by you or for you Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity Although all employment, certification, and licensure testing is certainly open to challenge, exams designed to physically assess a candidate’s performance on specific job skills and tasks are often more vulnerable to challenge than objective written exams. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity Examples of physical ability and practical exams: • Firefighter certification • Police officer pre-employment • Nursing practical for licensure • Corporate product certification • Food safety practical for licensure Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity Why are physical ability and practical exams more vulnerable to challenge? • Reliance on exam rater judgments regarding how a task was performed introduces the possibility of error in the assessment of the skill or task (human error) • Often, when only one rater is used to assess a candidate, there is increased likelihood of disagreement between the rater and the candidate • Physical ability exams typically have greater adverse impact upon protected groups than the written exams involved in a employment, certification, or licensure process (though practical exams do not tend to show the same pattern) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Standards Used For Exam Evaluation There are two set of standards that are often used to guide the development and evaluation of exams, they are as follows: Standards for Educational and Psychological Testing, 1999 Developed jointly by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) Uniform Guidelineson Employee Selection Procedures, 1978 Developed by the Equal Employment Opportunity Commission (EEOC) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Standards Used For Exam Evaluation Although both sets of standards contain valuable information regarding the development process (and both should be considered when developing a testing program), courts more frequently refer to the Uniform Guidelines as the resource for evaluating exams. The Uniform Guidelines are “entitled to great deference” by courts deciding whether selection devices such as physical ability or practical tests comply with Title VII. Griggs v. Duke Power Co., 401 U.S. 424 , 434 Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity What are the aspects of an examination that are most likely to be scrutinized if the validity of a physical ability or practical exam is challenged ? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity Job Analysis Criterion-Related Validity Cutscore Rater Training Candidate Appeal Process Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Job Analysis A job analysis is crucial in establishing that the content of the physical ability or practical exam is valid. Key components of the job analysis include: • Content Validity • Validity Generalization • Adequate and Diverse Sample Sizes Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Content Validity Although there are multiple validity methods that can be used during the test development process, the foundation for acceptable development practice continues to reside with traditional content validity methods. Supplemental validity methods are typically seen as beneficial, yet not sufficient, when courts evaluate testing processes. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Content Validity When evidence of validity based on test content is presented, the rationale for defining and describing a specific job content domain in a particular way (e.g., in terms of task to be performed or knowledge, skills, abilities, or other personal characteristics) should be stated clearly. Standard 14.9 A job analysis is necessary to identify the knowledge, skills and abilities necessary for successful job performance. A selection procedure can be supported by a content validity strategy to the extent that it is a representative sample of the content of the job. Guidelines, 29 CFR 1607.14(C)(1) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Content ValidityCase StudyWilliams v. Ford Facts • Class action claiming that pre-employment test for unskilled hourly production workers, Hourly Selection System Test Battery (HSSTB), discriminated against African Americans. • Physical/practical parts of HSSTB measured parts assembly, visual speed and accuracy and precision/manual dexterity. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Plaintiff’s Position Disparate impact discrimination, i.e., African Americans failed or scored lower on the test in disproportionately high numbers when compared to whites. HSSBT was not content valid because the job analysis failed to demonstrate a clear linkage of specific requirements. Ford’s Position HSSTB was content valid as supported by a job analysis. Job analysis consisted of: Supervisor identification of job inventories Supervisor rating of importance of job requirements and job abilities identified in the inventories Analysis of reliability ratings and data to identify key job requirements Development of test to measure skills needed to perform the jobrequirements rated as “important” Job Analysis - Content ValidityCase StudyWilliams v. Ford (cont’d) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Content ValidityCase StudyWilliams v. Ford(cont’d) Holding Ford demonstrated that the HSSTB was content valid. Reasoning • Ford had the burden of showing that the HSSBT was job related: “[Must show] by professionally acceptable methods, [that the test is] predictive or significantly correlated with important elements of work behavior that comprise or are relevant to the job or jobs for which the candidates are being evaluated.” Williams v. Ford, 187 F.3d 533, 539 (6th Cir. 1999). • Ford met this burden by showing that the HSSTB was content valid – It used a professional test developer to conduct a job analysis that complied with the EEOC Guidelines. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Validity Generalization An issue often referred to in test development is validity generalization. Validity generalization is defined as: “Applying validity evidence obtained in one or more situations to other similar situations on the basis of simultaneous estimation, meta-analysis, or synthetic validation arguments.” Standards, 1999, p. 184 Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Validity Generalization Transfer of validity work from one demographic and/or geographic area to another, while certainly possible when based on good initial validity work and a clear delineation of the original and secondary populations, has not been well received by courts as a defensible practice. This has typically been due to lack of appropriate documentation regarding the similarity of both the populations involved with the generalization and the interpretations resulting from the instrument. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Validity GeneralizationCase StudyLegault v. aRusso Facts • Challenge to physical abilities tests used to select fire department recruits. • Selection process included a four-part pass/fail physical abilities test involving climbing a ladder, moving a ladder from a fire engine, running 1.5 miles in 12 minutes, and carrying and pulling a fire hose. It also included a separate physical abilities test focusing on a balance beam, second hose pull and obstacle course. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis - Validity GeneralizationCase StudyLegault v. aRusso (cont’d) Holding Fire department failed to show the physical abilities tests were job related. Reasoning • The job analysis relied on by the fire department was not temporal or specific: - Validity was not supported by a “several-year-old job specification that describe[d] the firefighter’s general duties.” Legault v. aRusso, 842 F. Supp. 1479, 1488 (D.N.H. 1994) - Validity was not supported by a specification identifying only general tasks (e.g., “strenuous physical exertion,” “operating equipment and appurtenances of heavy apparatus,” etc.). The specification also failed to break these tasks into component skills, assess their relative importance or indicate the level of proficiency required. • The physical abilities tests were not valid simply because they were similar to those used by other cities - There was no evidence these similar tests were validated and “follow the leader is not an acceptable means of test validation.” Legault, 842 F. Supp. at 1488. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample Sizes Adequate and diverse sample sizes are a necessity for ensuring validity and increasing the defensibility of an exam. “A description of how the research sample compares with the relevant labor market or work force, . . ., and a discussion of the likely effects on validity of differences between the sample and the relevant labor market or work force, are also desirable. Descriptions of educational levels, length of service, and age are also desirable.” “Whether the study is predictive or concurrent, the sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or group of jobs in question . . .” Uniform Guidelines Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample SizesCase StudyBlake v. City of Los Angeles Facts • Female applicants challenged the police department’s height requirement and physical abilities test. • Applicants were required to be 5’6’’ and to pass a physical abilities test including scaling a wall, hanging, weight dragging and endurance within specific parameters. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Plaintiffs’ Position Challenged the methodology and findings of validation studies presented by the City. The validation studies relating to the height requirement did not include the individuals whom the police department was seeking to reject, i.e., those under 5’6.” The validation studies relating to the physical abilities test did not include those who failed the test and tested only success during academy training, not success on the job. The City’s Position The height requirement was job related – Offered validation studies correlating height to performance: Questionnaire showing that taller officers tend to use more force and experience less suspect resistance Simulations demonstrating that taller officers performed bar-arm control better than shorter officers The physical abilities test was job related – Offered validation studies correlating skills tested to measures of success during academy training and on the job requirements (e.g., foot pursuit, field shooting andemergency rescue). Job Analysis – Adequate and Diverse Sample SizesCase StudyBlake v. City of Los Angeles (cont’d) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample SizesCase StudyBlake v. City of Los Angeles (cont’d) Holding The validation studies did not demonstrate that the height requirement and physical abilities tests were job related. Reasoning • The validation studies did not reflect an adequate and diverse sampling. • The City failed to demonstrate the height requirement was job related because persons shorter than 5’6” were not included in the validation study (the study included individuals from 5’8” to 6’2”). • The City failed to demonstrate the physical abilities test was job related because the validation study relied on measures of training success without showing that those measures were significantly related to job performance. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Criterion-Related Validity When possible, the collection of criterion-related validity is extremely helpful in the defense of a physical ability test or practical exam. A criterion-related study “should consist of empirical data demonstrating that the selection procedure is predictive of or significantly correlated with correlated with important elements of performance.” Guidelines, 29 CFR 1607.5(B) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Criterion-Related Validity The goal of criterion-related validity is to show a significant relationship between how candidates perform on an exam and how they subsequently perform on the job (with higher scores resulting in better performance). This can be accomplished through the use of concurrent or predictive criterion-related validity. • Job Ratings • Promotional Exams • Etc. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis – Criterion-Related ValidityCase StudyZamlen v. City of Cleveland Facts • Female plaintiffs challenged the rank-order and physical abilities selection examination for firefighters. • The physical abilities test required three skills: overhead lift using barbells, fire scene set up and tower climb and dummy drag. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Plaintiff’s Position The physical abilities test did not test for attributes identified in the City’s job analysis as important to an effective firefighter. The test measured attributes in which men traditionally excel, such as speed and strength (anaerobic traits), and ignored attributes in which women traditionally excel, such as stamina and endurance (aerobic traits). The City’s Position The test was created by a psychologist with significant experience developing tests for municipalities. The physical abilities test measured attributes related to specific job skills. Job Analysis – Criterion-Related ValidityCase StudyZamlen v. City of Cleveland (cont’d) Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Job Analysis – Criterion-Related ValidityCase StudyZamlen v. City of Cleveland (cont’d) Holding The physical abilities test was valid since it was based on a criterion-related study. Reasoning • Referred to an earlier case, Berkman v. City of New York, 812 F.2d 52 (2d Cir. 1987), in which the court held that although aerobic attributes are an important component of firefighting, the City’s failure to include physical ability events that tested for such attributes did not invalidate the examination. • Given the extensive job analysis performed, “although a simulated firefighting examination that does not test for stamina in addition to anaerobic capacity may be a less effective baromoter of firefighting abilities than one that does include an aerobic component, the deficiencies of this examination are not of the magnitude to render it defective, and vulnerable to a Title VII challenge.” Zamlen, 906 F.2d 209, 219 (6th Cir. 1990). Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Cut Score The setting of an examination cut score is perhaps the most controversial step within the test development process, as it is this step that has the most obvious impact on the candidate population. The Uniform Guidelines state the following in regard to the determination of the cut score: “Where cutoff scores are used, they should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.” Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Cutscore Case StudyLanning v. SEPTA Facts • Title VII class action challenging SEPTA’s requirement that applicants for the job of transit police officer be able to run 1.5 miles in 12 minutes. • In prior related cases, it was established that the running requirement was job related. The sole issue before the court was whether the cut off was valid. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Cutscore Case StudyLanning v. SEPTA (cont’d) Holding The cut off established by SEPTA was valid. Reasoning • The court looked at whether the cut off measured the minimum qualifications necessary for the successful performance of a transit police officer. • Studies introduced by SEPTA showed a statistical link between the success on the run test and the performance of identified job standards - Individuals who passed the run test had a success rate of 70% to 90% and individuals who failed the run test had a success rate of 5% to 20%. • The court emphasized that the cut off does not need to reflect a 100% rate of success, but there should be a showing of why the cut off is an objective measure of the minimum qualifications for successful performance. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
A Good Defense Many organizations spend a considerable amount of time and money on the valid and defensible development of a practical exam or a physical ability test. Surprisingly, after the lofty investment in the development of these exams, some organizations fail to establish appropriate training for raters involved in the administration of the exam. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
A Good Defense When using practical exams or physical ability tests, there are two aspects of the testing program that, when well established, can reduce the likelihood of a challenge: 1. Rater Training 2. Candidate Appeal Process Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training Proper rater training is key in minimizing challenges to a practical exam/physical ability test. • Standardized training materials and sessions • Inter-Rater and Intra-Rater Reliability Studies • Follow up training Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Standardized Materials Practical exams and physical ability tests rely on examination raters to identify whether or not a candidate performed the activity or event appropriately. One way to reduce challenges to this type of exam is to have a robust training program that is required of all raters on a regular basis. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Standardized Materials • Standardized materials can include the following components: • Train the Trainer Manual/Materials • Examination Rater Manual • Examination Rater Video • 4. Rater Checklist Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Rater Reliability • “When subjective judgment enters into test scoring, evidence • should be provided on both inter-rater consistency in scoring • and within-examinee consistency over repeated • measurements.” • Standard 2.13 • Does an individual rater apply the testing standards consistently across multiple candidates? • Do groups of raters rate the same candidate consistently? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Rater Reliability • Rater Reliability during the training process: • Part of the rater training process should involve groups of raters rating the same performance, to evaluate whether or not a consistent testing standard is being applied. • This process should include an opportunity for all raters to discuss outliers and reach consensus about the appropriate standards. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Rater Reliability • Rater Reliability after the training process: • Trends of individual raters should be evaluated to monitor the • consistency of individual raters over time. Although it should be • expected that raters will evaluate candidates differently, it is • possible to review whether raters are consistently shifting over time. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Follow Up Training • There are instances when individuals have developed a valid exam, • appropriately trained their raters, and then experience problems due • to a lack of consistent, follow up training sessions for examination • raters. • Like any other aspect of a testing program, raters should be • evaluated on a regular basis. In addition, raters should be required to • undergo re-training on a periodic basis. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Appeal Process One aspect of a testing program that should always be considered during inception is the avenue for candidate feedback and (if necessary) appeals. Often, allowing an avenue for candidates to request feedback or investigation into a exam administration will reduce the likelihood that the challenge will progress to a legal one. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Appeal Process • Important aspects of a candidate feedback and appeal process: • Public documentation of the feedback and appeal process • Clear candidate instructions on the information that should be included in feedback and/or appeal • Specific timeframes for responses to feedback or appeals • Designated group of resources to address feedback and appeal issues Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Rater Training – Appeal Process Developing an avenue for client feedback at the inception of a program is viewed much more positively by courts than one that is set up after a challenge to the exam. Processes developed post-challenge tend to be viewed with an air of suspicion. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
RecommendationsCase StudyFirefighters United for Fairness v. City of Memphis Facts • Class action challenging the practical portion of fire department promotional test. • Practical portion consisted of a videotaped response to a factual situation presenting problems commonly encountered by fire department lieutenants and battalion chiefs. • Plaintiffs claimed the practical test violated their due process and equal protection rights under the Fourteenth Amendment. Holding The practical test did not violate Plaintiffs’ rights under the Fourteenth Amendment. Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Fairness in grading Court upheld the use of two raters to grade transcripts of practical video components of test using answer key developed by subject matter experts. According to the court, this system “ensured that the capricious whim of individual assessors would not contribute to any alleged incorrect scores.” Firefighter United for Fairness v. City of Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn. 2005). Fairness in review City established a multi-level review process: Candidates were permitted to review practical video, transcript of practical video, answer key of raters and submit “redlines” citing specific concerns with their tests Subject matter experts reviewed the redlines and changed scores to reflect problems inherent in the form, content or grading of the test, where appropriate RecommendationsCase StudyFirefighters United for Fairness v. City of Memphis (cont’d) Reasoning Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists Job Analysis • Does the job analysis define the knowledge, skills, and abilities that compose the important and/or critical aspects of the job in question? • Was the job analysis conducted specifically for the job in question? • Is the job analysis current and based on a relevant candidate population? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists Criterion-Related Validity • If possible, were criterion-related validity studies conducted? Concurrent Study? Predictive Study? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists Cut Score • Was a cut score study conducted with a representative sample of subject-matter experts (e.g., Modified Angoff Technique)? • Has the cut score process been documented? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists Rater Training • Has a standardized rater training program been established? • Does the rater training include opportunities to ensure rater reliability? • Is follow up training provided on a regular basis? • Is rater data reviewed on a regular basis to identify changes in rating trends? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists Candidate Appeal Process • Is there an avenue for candidates to provide feedback or submit an appeal regarding an examination administration? • Is that avenue well documented and publicly available? • Are there designated resources available for addressing feedback and appeals? Presented at the 2006 CLEAR Annual Conference September 14-16 Alexandria, Virginia