Seven Steps for Developing a Valid Paper-and-Pencil Written Promotional Examination Using Content Validation

Seven Steps for Developing a Valid Paper-and-Pencil Written Promotional Examination Using Content Validation Dan A. Biddle, Ph.D., CEO Stacy L. Bell, Director Fire & Police Selection, Inc. March 17, 2006

The presentation will: Focus on requirements set forth by the Uniform Guidelines on Employee Selection Procedures (1978) to develop a job-related written test. Be limited to only job knowledge written tests used for promotion. Review the steps necessary to use content validation to support the job knowledge written test. The presentation will not: Provide a comprehensive review of litigated cases involving job knowledge written tests used for promotion. Provide legal advice on handling litigation. Provide a prescription/recipe strategy that can be used for all circumstances. Presentation Overview

Proceed with Caution… • Based on the size and type of your employer, the following agencies or people can initiate litigation against your testing practices: • Equal Employment Opportunity Commission (EEOC) • Department of Justice (DOJ) • Office of Federal Contract Compliance Programs (OFCCP) (under DOL) • Private Plaintiff Attorney

Examples of Litigated Promotional Written Test Processes • Bouman v. Block, 940 F2d 1211 (9th Cir. 1991). • Brown v. City of Chicago, 8 F.Supp.2d 1095 (N.D.I11. 1998). • Hearn v. City of Jackson, Miss. 110 Fed. 424 (5th Cir. 2004). • Isabel v. City of Memphis, F.Supp.2d, 2003 (6th Cir. 2003). • Paige v. State of California, 102 F.3d 1035, 1040 (9th Cir. 1996).

Disparate Impact Litigation & theCivil Rights Act of 1991 Amends Section 703 of the 1964 Civil Rights Act (Title VII) (k)(1)(A). An unlawful employment practice based on disparate impact is established under this title only if: • A(i) a complaining party demonstrates that a respondent uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin, and the respondent fails to demonstrate that the challenged practice is job-related for the position in question and consistent with business necessity; OR, • A(ii) the complaining party makes the demonstration described in subparagraph (C) with respect to an alternate employment practice, and the respondent refuses to adopt such alternative employment practice.

How Can Testing Practices be Challenged? Title VII Disparate Impact Discrimination Flowchart “OR”

Right Test…Wrong Use • Brown v. City of Chicago, 8 F.Supp.2d 1095 (N.D.I11. 1998). • The City of Chicago police promotional written test used for lieutenant resulted in gross disparate impact against minorities. • An alternative employment practice was approved whereby the City suggested combining merit promotions with the rank-order promotions. The City of Chicago failed to incorporate this alternative employment practice (AEP) into its practices. • Despite the overwhelming need for more minority representation, the City made promotions based on a rank-order list.

Hearn v. City of Jackson, Miss. 110 Fed.Apprx. 424 (5th Cir. 2004). Black police officers brought action under Title VII alleging that selection of individuals for sergeant position discriminated against black officers. Plaintiffs argued that all applicants should have been allowed to proceed through all three stages of the recruitment process rather than be cut from the written test which had adverse impact. Court ruled that the City supported the appropriateness of the written test through a job analysis and that the written test was a legitimate and cost effective method to screen out candidates on the basis of weakness in technical and legal knowledge required on the job. Isabel v. City of Memphis, F.Supp.2d, 2003 (6th Cir. 2003). Sergeants in the Memphis Police Department brought action under Title VII alleging that selection of individuals for lieutenant position discriminated against Blacks. City had negotiated to use a 70% cutoff for the test which had disparate impact against Blacks. Court ruled that the cutoff score was invalid: “To validate a cutoff score, the inference must be drawn that the cutoff score measures minimal qualifications…” The district court found that the cutoff score was “nothing more than an arbitrary decision and did not measure minimal qualifications." Successful and Unsuccessful Promotional Written Test Challenges

Disclaimer • The steps outlined in this presentation are based on the requirements outlined by the Uniform Guidelines (1978), the Principles, and the Standards. • The proposed model is not intended to be a one-size-fits-all model, but rather a generic template which could be employed in an ideal setting. • While we cannotguarantee that by following these steps you will avoid litigation, implementing the practices outlined in this presentation will greatly increase your likelihood of success in the event of a challenge to your promotional written testing process.

7 Steps for Content Validation of a Job Knowledge Written Test for Promotions • Step #1: Conduct a Job Analysis • Step #2: Develop a Selection Plan • Step #3: Identify Test Plan Goals • Step #4: Develop the Test Content • Step #5: Validate the Test • Step #6: Compile the Test • Step #7: Post-Administration Analyses

Step 1: Conduct a Job Analysis • 1-2 page job descriptions are ALMOST NEVER SUFFICIENT for showing validation under the Guidelines…unless (at a bare minimum!): • Importance ratings • Job expert input • Operationally defined KSAs • Duty/KSA linkages • In 90% of circumstances where validity is required, we find that new, updated job analyses need to be developed. • A house is only as strong as the foundation.

Creating a Uniform Guidelines Style Job Analysis • According to the Guidelines, Job Duties include at least: • Frequency(15B3, 14D4) • This duty is (Select one option from below) by me or other active (xxxx) in my department. 1 not performed 2 performed less than yearly 3 usually performed quarterly to yearly 4 usually performed monthly to quarterly 5 usually performed daily to weekly 6 usually performed several times a day • Importance (14B2, 14C1,2,4; 14D2,3; 15B3, 15C3,4,5; 15D3) • Competent performance of this duty is (Select one option from below) for the job of (xxxx) in my department. 1 not required 2 not important – Trivial or minor significance to the performance of the job. 3 important – Helpful, and/or meaningful to the performance of the job. 4 critical – Necessary for the performance of the job. 5 extremely critical – Necessary for the performance or the job with more extreme consequences

Creating a Uniform Guidelines Style Job Analysis • According to the Guidelines, Job Duties include at least: • Differentiating “Best Worker”Ratings (14C9) (Content Validity only) • Above-minimum performance of this duty makes (Select one option from below) in the overall job performance for the job of (xxxx) in my department. 1 little or no difference 2 some difference 3 a significant difference • a very significant difference Note: Performance Differentiating Ratings are only necessary if ranking or higher-than-minimum-competency cutoffs will be used for WORK SAMPLE types of tests, and are therefore not typically necessary for job knowledge tests.

Creating a Uniform Guidelines Style Job Analysis • According to the Guidelines, KSAPCs include at least: • Frequency(15B3, 14D4) • This KSAPC is (Select one option from below) by me or other active (xxxx) in my department. 1 not performed 2 performed less than yearly 3 usually performed quarterly to yearly 4 usually performed monthly to quarterly 5 usually performed daily to weekly 6 usually performed several times a day • Importance (14B2, 14C1,2,4; 14D2,3; 15B3, 15C3,4,5; 15D3) • This KSAPC is (Select one option from below) for the job of (xxxx) in my department. 1 not required 2 not important – Trivial or minor significance to the performance of the job. 3 important – Helpful, and/or meaningful to the performance of the job. 4 critical – Necessary for the performance of the job. 5 extremely critical – Necessary for the performance or the job with more extreme consequences

Creating a Uniform Guidelines Style Job Analysis • According to the Guidelines, KSAPCs include at least: • Differentiating “Best Worker”Ratings (14C9) (Content Validity only) • Above-minimum performance of this KSAPC makes (Select one option from below) in the overall job performance for the job of (xxxx) in my department. 1 little or no difference 2 some difference 3 a significant difference 4 a very significant difference • When Needed (5F; 14C1) • Use one of the following ratings to indicate when this KSAPC is needed for the (xxxx) position in your department. 1 This knowledge, skill, ability, or personal characteristic is fully developed on the job. 2 This knowledge, skill, ability, or personal characteristic is generally required at entry, but is developed on the job (that is, it is more fully developed or learned through job training or job experience). 3 This knowledge, skill, ability, or personal characteristic is fully required at entry (that is, a firefighter in my department is required to have already learned or developed this before they start the job).

Creating a Uniform Guidelines Style Job Analysis • Mastery Level Scale • This level of job knowledge held must be at a (Select one option from below) level for successful job performance. 1—Low: none or only a few general concepts or specifics available in memory in none or only a few circumstances without referencing materials or asking questions. 2—Familiarity: have some general concepts and some specifics available in memory in some circumstances without referencing materials or asking questions. 3—Working knowledge: have most general concepts and most specifics available in memory in most circumstances without referencing materials or asking questions. 4—Mastery: have almost all general concepts and almost all specifics available in memory in almost all circumstances without referencing materials or asking questions.

Creating a Uniform Guidelines Style Job Analysis • Mastery Level Scale • The data from these ratings are useful for choosing the job knowledges that should be included in a written job knowledge test. We suggest only including job knowledges that have average ratings of 3.0 or higher on written job knowledge tests. • Section 14C(4) of the Guidelines require that job knowledges measured on a test should be “. . . operationally defined as that body of learned information which is used in and is a necessary prerequisite for observable aspects of work behavior of the job.”

Creating a Uniform Guidelines Style Job Analysis • Duty/KSA Linkages (14C4, Content Validity only) • This KSAPC is ________ to the performance of this duty. 1 not important 2 of minor importance 3 important 4 of major importance 5 critically important

Examples of a Strong and Weak KSAPC • Example of a weak KSAPC: • Knowledge of ventilation practices. • Example of a strong KSAPC: • Knowledge of ventilation practices and techniques to release contained heat, smoke, and gases in order to enter a building. Includes application of appropriate fire suppression techniques and equipment, e.g., manual and power tools and ventilation fans.

Content Validity: Essential (Job Analysis) • 58. Q. Is a full job analysis necessary for all validity studies? • A. It is required for all content and construct studies, but not for all criterion related studies . . . Measures of the results or outcomes of work behaviors such as production rate or error rate may be used without a full job analysis where a review of information about the job shows that these criteria are important to the employment situation of the user. Similarly, measures such as absenteeism, tardiness or turnover may be used without a full job analysis if these behaviors are shown by a review of information about the job to be important in the specific situation. A rating of overall job performance may be used without a full job analysis only if the user can demonstrate its appropriateness for the specific job and employment situation through a study of the job.

Step 2: Develop a Selection Plan • Review the KSAPCs from the job analysis • Select only those KSAPCs that meet the following criteria: • Required at the time of hire—Needed on day one • Important or critical (necessary) for job performance • Mastery level of knowledge • Of the KSAPCs that meet this criteria, determine the best way to measure the KSAPC (e.g., on a written test, in a physical ability test, in a structured oral interview, etc.). • Concentrate only on those items that are best measured with a written test.

Sample KSAPC Measurement Survey

Step 3: Identify Test Plan Goals • Identify the test sources. • Review relevant job-related materials and discuss the target job in considerable detail with job experts. This will focus attention on job specific information for the job under analysis. • Review the knowledges that meet the necessary criteria and determine which source and/or textbooks are best suited to measure the various knowledges. • Ensure that the sources do not contradict one another in content.

Preparatory Materials Offered to Candidates? • Ensure that materials are: • Current • Specific • Released to all candidates taking the test

Preparatory Sessions Offered to Candidates? • Use of study sessions have been shown to increase overall test performance and reduce adverse impact. • Be cautious of administering too many study sessions as research suggests conducting multiple sessions may actually result in increasing adverse impact. • Try to schedule study sessions at a location that is geographically convenient to all candidates and is offered at a reasonable time of day. • Invite all candidates to attend and provide plenty of notice of the date and time.

Identify the Number of Test Items • Include enough items to ensure high test reliability. • A number of factors impact test reliability, but perhaps the single most important factor is the number and quality of test items per relevant KSAPC in the test overall. • Consider using job expert input to determine internal weights for the written test. • Job experts to be provided with the list of KSAPCs to be measured and asked to assign 100 points among the KSAPC list. • Ensure adequate sampling of KSAPCs. • A sufficient number of items should be developed to effectively measure each KSAPC at the desired level. • Note that some KSAPCs will require more items than others for making a “sufficiently deep” assessment • Ensure proportional sampling of KSAPCs. • The test should be internally weighted in a way that insures a robust measurement of the relevant KSAPCs.

Determine the Type of Test Items • Following the determination of the length of the test and the number of items to be derived from each source, develop a test plan. • Use a process-by-content matrix to ensure adequate sampling of job knowledge content areas and problem-solving processes. Problem-solving areas involve the following • A. Knowledge of terminology. • B. Understanding of principles. • C. Application of knowledge to new situations • While knowledge of terminology is important, the understanding and application of principles may be considered primary importance.

Process-by-Content Matrix • PROCESS-BY-CONTENT MATRIX—POLICE SERGEANT SOURCE DEF PRINC APP TOTAL • 1. Essentials of Modern Police Work 4 10 20 34 • 2. Community Policing 3 7 13 23 • 3. Rules of Evidence 3 10 17 30 • 4. Department Rules & Regulations 1 3 6 10 • 5. State Criminal Code 4 5 9 18 • 6. State Vehicle Code 4 6 10 20 • 7. City Ordinances 2 2 6 10 • 8. Performance Appraisal Guidelines/ 0 1 1 2 Employee Ratings • 9. Labor Agreement with the City 0 1 2 3 Total 21 45 84 150

Consider Bloom’s Taxonomy • The use of Bloom’s Taxonomy (1956) is very helpful when writing items intended to measure knowledges at various levels. • Job experts should consider how the knowledge is applied on the job—e.g., factual recall, application, analysis, etc. • Often times the higher levels of Bloom’s are best measured in a testing format other than a written test (e.g., an interview, an essay, etc.).

Consider Bloom’s Taxonomy

Step 4: Develop the Test Content • Select a diverse panel of 4-10 job experts (use a minimum of one year experience). • Review the selection plan to ensure that the job experts understand the parameters. • Have job experts sign a “Confidentiality Form.” • Train job experts on item-writing (if they are writing items). • Job experts write items to be reviewed by others on the panel. • Job experts review each other’s items. • Ensure proper grammar, style, and consistency. • Ensure that the selection plan and test plan requirements are met. • Ensure that all items meet the criteria set forth on the “Validation Survey” form. • Develop extra items! • Create the final test version for the panel to review.

Step 5: Validate the Test • Using the “Validation Survey” form, job experts assign various ratings to the items in the test bank including ratings: • on the quality of each test item • on the job-relatedness of each test item • regarding the appropriate level of each test item (i.e., difficulty) • to ensure that the test items are based on current information • to determine if the item measures an aspect of job knowledge that must be memorized • to determine the consequence of error if the applicants does not possess the knowledge required to answer the item • Identify an appropriate time limit. • A common rule-of-thumb used by practitioners to determine a written test cutoff time is to allow one minute per test item plus thirty additional minutes (e.g., a 150-item test would yield a three hour time limit). • A reasonable time limit would allow for 95% of the candidates to complete the test within the time limit.

Sample Test Validation Survey

Address: UGESP, Standards, Principles, and Court Cases

Step 6: Compile the Test • Analyze the “Angoff” ratings identified by job experts. • Discard raters whose ratings are statistically different from other raters. • Evaluate rater reliability • Evaluate high/low rater bias • Calculate the overall difficulty level of the test (called the “unmodified Angoff” level)

Step 7: Post-Administration Analyses • Conduct an item-level analysis to identify: • Point biserials • Item difficulty • Differential Item Functioning (DIF) (see caution on next slide) • Remove bad items and adjust unmodified Angoff • Conduct a test-level analysis to assess descriptive and psychometric statistics (e.g., reliability) • Calculate Standard Error of Measurement or Conditional Standard Error of Measurement

Before removing test items based on DIF analyses, consider the lesson offered in Hearn v. City of Jackson (Aug. 7, 2003): • FN11. Plaintiffs suggest in their post-trial memorandum that the test is subject to challenge on the basis that they failed to perform a DIF analysis to determine whether, and if so on which items, blacks performed more poorly than whites, so that an effort could have been made to reduce adverse impact by eliminating those items on which blacks performed more poorly. . . Dr. Landy testified that the consensus of professional opinion is that DIF modifications of tests is not a good idea because it reduces the validity of the examination…” • Dr. Landy explained: The problem with [DIF] is suppose one of those items is a knowledge item and has to do with an issue like Miranda or an issue in the preservation of evidence or a hostage situation. You're going to take that item out only because whites answer it more correctly than blacks do, in spite of the fact that you'd really want a sergeant to know this [issue] because the sergeant is going to supervise. A police officer is going to count on that officer to tell him or her what to do. So you're reducing the validity of the exam just for the sake of making sure that there are no items in which whites and blacks do differentially, or DIF, and he's assuming that the reason that 65 percent of the blacks got it right and 70 percent of the whites got it right was that it's an unfair item rather than, hey, maybe two or three whites or two or three blacks studied more or less that section of general orders.

Summary • If the test has adverse impact, validate. • Address the Uniform Guidelines, Principles, and Standards (in that order) • A house is only as strong as its foundation: Build a solid job analysis • Considerate alternate test usages • Use Test Validation Survey

Seven Steps for Developing a Valid Paper-and-Pencil Written Promotional Examination Using Content Validation