710 likes | 733 Views
Lecture: Wednesday, 3/04 Lecture: Wednesday, 3/18 Exam: Monday, 3/23. Spring break: 3/09 and 3/11 ME1: Monday, 3/16 Last day to withdraw: Monday, 3/23. PSY 6430 Unit 5. Validity Determining whether the selection instruments are job-related. SO1: NFE, Validity, a little review.
E N D
Lecture: Wednesday, 3/04 Lecture: Wednesday, 3/18 Exam: Monday, 3/23 Spring break: 3/09 and 3/11 ME1: Monday, 3/16 Last day to withdraw: Monday, 3/23 PSY 6430 Unit 5 Validity Determining whether the selection instruments are job-related
SO1: NFE, Validity, a little review • Predictor = test/selection instrument • Use the score from the test to predict who will perform well on the job • Possible confusion (again) • You need to determine the validity of the test based on your current employees • Then you administer it to applicants and select employees based on the score (a few students had a problem distinguishing between validity and reliability on E4, example next)
SO1: NFE, Validity, example • Administer a test to current employees • Obtain measures of how well they perform on the job • Correlate the test scores with the performance measures • Assume: The correlation is statistically significant • Assume: Current employees who score 50-75 also are performing very well on the job • Now you administer the exam to applicants, predicting that those who score 50-75 will also perform well on the job (main point next slide)
SO1: NFE, Validity main point • You determine the validity of a selection test or instrument based on your current employees • Then after establishing the validity or job relatedness of the test • Give the test to applicants and select them on the basis of their test scores
SO2: Reliability vs. Validity • Reliability Operational Definition: Is the score on the measure stable, dependable, and/or consistent? Conceptual Definition: Are you actually measuring what you want to be measuring? • Validity Is the measure related to performance on the job?
SO3: Relationship between reliability and validity • A measure can be reliable, but not valid • However, a measure cannot be valid unless it is reliable • *Reliability is a necessary but not sufficient condition for validity • Text gives a perfect example You can reliably measure eye color, however, it may not be related to job performance at all *key point
Types of validation procedures • Content: expert judgment • Criterion-related: statistical analyses (concurrent & predictive) • Construct (but not practical-not covering this) • Validity generalization (transportable, no local validity study – jobs are similar) • Job component validity (not covering this in this unit, but will return to it briefly in the next unit, uses broad job elements/components based on all possible jobs) • Small businesses: Synthetic validity (not covering it, not very relevant now –content validity) (main types are the two kinds of criterion-related and content validity; construct really a hold over from test construction - not very relevant - I have only seen this used by a few organizations – create their own tests; cover validity generalization, but right now while validity generalization has excellent professional support, may not be legal - professional guidelines depart from legal; in one case, 6th Circuit Court ruled it illegal as a matter of law based on Griggs/Duke and Albermarle - 1987)
SO5 NFE but 7B is: Difference between content and criterion-related validity • Criterion-related validity is also called “empirical” validity • Concurrent validity • Predictive validity • This type of validity relies on statistical analyses (correlation of test scores with measures of job performance) • Measures of job performance = criterion scores (content next slide)
SO5 NFE but related to 7B which is: Difference between content and criterion-related validity • Content validity, in contrast, relies on expert judgment and a match between the “content” of the job and the “content” of the test • Expert judgment refers to • the determination of the tasks and KSAs required to perform the job via a very detailed type of job analysis • linking the KSAs to selection procedures that measure them
Intro to content validity • You do NOT use statistical correlation to validate your tests • Validation is based “only” on your job analysis procedures and descriptively linking the KSAs to selection measures • It is much more widely used than criterion-related validity • Particularly since Supreme Court ruled it was OK to use for adverse impact cases (1995) (again, to emphasize)
SO6: Two reasons why content validity is often used • It can be used with small numbers of employees • Large sample sizes are required to use criterion-related validity due to the correlation procedures • The text later when talking about criterion-related validity indicates you may need over several hundred • Dickinson: usually 50-100 is adequate • How many companies have that many current employees in one position? (small number of incumbents)
SO6: Two reasons why content validity is often used • Many organizations do not have good job performance measures • You need good performance criterion measures to do a criterion-related validity study because you correlate the test scores with job performance measures
SO7A: Content vs. criterion-related validity and the type of selection procedure • If you use content validity you should write the test, not select an off-the-shelf test • If you use criterion-related validity, you can do either • It is much easier and less time consuming to use an off-the-shelf test than to write one! (VERY IMPORTANT!; book waffles on this a bit, indicating that emphasis should be placed on constructing a test, But only in rare situations would I recommend selecting off-the-shelf test with content validity - legally too risky; why, next slide)
SO7A: Why should you write the test if you use content validity? (this slide, NFE) • Content validity relies solely on the job analysis • The KSAs must be represented proportionately on the selection test as indicated in the job analysis in terms of: • Their relative importance to the job • The percentage of time they are used by the employees • It is highly unlikely that an off-the-shelf test will proportionately represent the KSAs as determined by your job analysis • In some discrimination court cases, the judge has gone through the test item by item to determine whether the items were truly proportional to the KSAs as determined by the job analysis • Both professional measurement reason and legal reason to write the test rather than using an off-the-shelf test
SO7B: Content vs. criterion-related validity: Differences in the basic method used to determine validity (review) • Content validity • Relies solely on expert judgment - no statistical verification of job-relatedness • Criterion-related validity • Relies on statistical verification to determine job-relatedness (I am not going to talk about SO8, face validity; very straightforward)
SO9: What is the “heart” of any validation study and why? • Job analysis • The job analysis determines the content domain of the job – the tasks and KSAs that are required to perform the job successfully
SO10: Major steps of content validity - very, very specific requirements for the job analysis • Describe tasks for the job • *Determine the criticality and/or importance of each of the tasks • Specify the KSAs required for EACH task • KSAs must be linked to each task (NFE) *Now because of ADA, is it an essential function? (cont. next slide)
SO10: Major steps of content validity, cont. • Determine the criticality and/or importance of each KSA* • Operationally define each KSA • Describe the relationship between each KSA and each task statement • You can have KSAs that are required for only one or two tasks, or you can have KSAs that are required to perform several tasks • The more tasks that require the KSAs, the more important/critical they are • Describe the complexity or difficulty of obtaining each KSA (formal degree, experience) • Specify whether the employee must possess each KSA upon entry or whether it can be acquired on the job (cannot test for a KSA if it can be learned within 6 months) • Indicate whether each KSA is necessary/essential for successful performance of the job *Only the first major point will be required for the exam, but I want to stress how detailed your job analysis must be for content validity (cont on next slide)
SO10: Major steps of content validity, cont. • Link important job tasks to important KSAs* (FE) • Reverse analysis; you have linked the KSAs to the tasks, now you must link the KSAs to the tasks (NFE) • KSA # 1 may be relevant to Tasks 1, 6, 7, 10, 12, & 22 • KSA # 2 may be relevant to Tasks 2, 4, & 5 • Etc. • (NFE) Develop test matrix for the KSAs • If you want see how you go from the task analysis to the actual test, turn ahead to Figures 7.12, 7.13, 7.14, 7.15, and 7.16 on pages 283-286 and Figure 7.17 on page 290
SO11: When you can’t use content validity according to the Uniform Guidelines • When assessing mental processes, psychological constructs, or personality traits that cannot be directly observed, but are only inferred • You cannot use content validity to justify a test for judgment, integrity, dependability, extroversion, flexibility, motivation, conscientiousness, adaptability, or any personality characteristic • The reason for that is that you are basing your job analysis on expert judgment - and judgment is only going to be reliable if you are dealing with concrete KSAs such as mechanical ability, arithmetic ability or reading blue prints • The more abstract the KSA, the less reliable judgment becomes • If you can’t see it, if you can’t observe it, then the leap from the task statements to the KSAs can result in a lot of error (text mentions three; I am having you learn the first one and one I added in the SOs -- these are the two that are most violated in practice; the second one is relevant to BOTH content and criterion-related so shouldn’t be listed under when you can’t use content validity: cannot test for KSAs that can be learned on the job)
SO11: When you can’t use content validity according to the Uniform Guidelines, cont. • When selection is done by ranking test scores or banding them (from U1) • If you rank order candidates based on their test scores and select on that basis, you cannot use content validity - you must use criterion-related validity • If you band scores together, so those who get a score in a specified range of scores are all considered equally qualified, you cannot use content validity - you must use criterion-related validity • Why? If you use ranking or banding, you must be able to prove that individuals who score higher on the test will perform better on the job - the only way to do that is through the use of statistics The only appropriate (and legally acceptable) cut-off score procedure to use is a pass/fail system where everyone above the cut-off score is considered equally qualified (only relevant if adverse impact)
Criterion-related validity studies:Concurrent vs. predictive • SO13A: Concurrent validity Administer the predictor to current employees and correlate scores with measures of job performance Concurrent in the sense that you have collected both measures at the same time for current employees • SO18A: Predictive validity Administer the predictor to applicants, hire the applicants, and then correlate scores with measures of job performance collected 6-12 months later Predictive in the sense that you do not have measures of job performance when you administer the test - you collect them later (comparison of the two, SO13A, describe concurrent validity; SO18A, describe predictive validity)
Predictive Validity: Three basic ways to do it • Pure predictive validity: by far the best Administer the test to applicants and randomly hire • Current system: next best, more practical Administer the test to applicants, use the current selection system to hire (NOT the test) • Use test to hire: bad, bad, bad both professionally and legally Administer the test, and use the test scores to hire applicants (going to come back to these and explain the evaluations; text lists the third as an approach! Click: NO!!)
SO13B: Steps for conducting a concurrent validity study • Job analysis: Absolutely a legal requirement • Discrepancy between law and profession (learn for exam) • Law requires a job analysis (if adverse impact & challenged) • Profession does not as long as the test scores correlate significantly with measures of job performance • Determine KSAs and other relevant requirements from the job analysis, including essential functions for purposes of ADA • Select or write test based on KSAs (learn for exam) • May select an off-the-shelf test or • Write/construct one
SO13B: Steps for conducting a concurrent validity study • Select or develop measures for job performance • Sometimes a BIG impediment because organizations often do not have good measures of performance • Administer test to current employees and collect job performance measures for them • Correlate the test scores with the job performance measures • (SO14: add this step) Determine whether the correlation is statistically significant at the .05 level You can then use the test to select future job applicants
SO15A: Advantage of concurrent validity over predictive validity • Because you are using the test data and performance data from current employees, you can conduct the statistical validation study quickly – in a relatively short period of time • Remember, that with predictive validity, you must hire applicants and then wait 6-12 months to obtain measures of job performance (post-training, after they have learned the job)
SO15B&C: The basic reason that accounts for all of the weaknesses with concurrent validity • All of the weaknesses have to do with differences between your current employees and applicants for the job • You are conducting your study with one sample of the population (your employees) and assuming conceptually that your applicants are from the same population • However, your applicants may not be from the same population - they may differ in important ways from your current employees • Ways that would cause them (as a group) to score differently on the test or perform differently on the job, affecting the correlation (job relatedness) of the test *The first point is related to B, the other points are related to and essential to C. (text lists several weaknesses and all of them really relate to one issue; dealing with inferential statistics here)
SO15D: Some specific differences • Job Tenure: If your current employees have been on the job a long time, it is likely to affect both their test scores and job performance measures • Age & Education: Baby boomers vs. Generation Xers vs. millennials; high school vs. college vs. graduate degree • Different motivational level: employees already have a job, thus they may not be as motivated to perform well on the test; on personality measures, applicants may be more motivated to alter their responses to make themselves look good • Exclusiveness of current employees: sample doesn’t include those who were rejected, those who were fired, those who left the organization, and employees who were promoted which can affect both test and performance scores (SO asks you to learn any three)
SO16: Restriction in range • This is the term used for the statistical/mathematical reason why the differences between your current employees and applicants affect validity • It also explains from the last unit, why reliability is generally higher when • Your sample consists of individuals who have greater differences in the ability for which you are testing • High school students, community college students, vs. engineering majors in college who take a math test • The questions are moderately difficult – about 50% of test takers answer the questions correctly – rather then when the questions are very easy or very difficult
SO16: Restriction in range • With criterion-related validity studies the ultimate proof that your selection test is job related is that the correlation between the test scores and job performance measures is statistically significant • A high positive correlation tells you • People who score well on the test also perform well • People who score middling on the test are also middling performers • People who score poorly on the test also perform poorly on the job • In order to obtain a strong correlation you need • People who score high, medium, and low on the test • People who score high, medium, and low on the performance measure (before really understanding the weaknesses related to concurrent validity and why pure predictive validity is the most sound type of validation procedure, you need to understand what “restriction in range” is and how it affects correlation coefficient; related to some of the material from the last unit on reliability - so if you understood it in that context, this is the same conceptual issue)
SO16: Restriction in range, cont. • That is, you need a range of scores on BOTH the test and the criterion measure in order to get a strong correlation • If you only have individuals who score about the same on the exam, regardless of whether some perform well, middling, and poorly, you will get a zero correlation • Similarly if you have individuals who score high, medium, and low on the test, but they all perform reasonably the same, you will get a zero correlation • Any procedure/factor that decreases the range of scores on either the test or the performance measure • Reduces the correlation between the two and, hence, • Underestimates the true relationship between the test and job performance • That is, you may conclude that your test is NOT valid, when in fact, it may be
SO16: Restriction in range, cont. • Restriction in range is the technical term for the decrease in the range of scores on either or both the test and criterion • Concurrent validity tends to restrict the range of scores on BOTH the test and criterion, hence underestimating the true validity of a test (stress the either or both; cont on next slide)
SO16: Restriction in range, cont.Also related to SO17A&B • Why? You are using current employees in your sample • Your current employees have not been fired because of poor performance • Your current employees have not voluntarily left the company because of poor performance • Your current employees have been doing the job for a while and thus are more experienced • All of the above would be expected to • Result in higher test scores than for the population of applicants • Result in higher performance scores than for the population • Thus, restricting the range of scores on both the test and the performance criterion measure (diagrams on next slide)
High Performance Low Low High Test Scores High Performance Low Low High Test Scores SO16: Restriction in range, cont. • Top diagram • No restriction in range • Strong correlation • Bottom diagram • Restriction in range • Test scores and • Performance scores • Zero correlation (extreme example, but demonstrates point - concurrent validity is likely to restrict range on both, underestimating true validity)
SO18: Predictive validity • SO18A: Predictive validity (review) Administer the predictor to applicants, hire the applicants, and then correlate scores with measures of job performance collected 6-12 months later Predictive in the sense that you do not have measures of job performance when you administer the test - you collect them later, hence, you can determine how well your test actually predicts future performance
SO18B: Steps for a predictive validity study • Job analysis: Absolutely a legal requirement • Determine KSAs and other relevant requirements from the job analysis, including the essential functions for purposes of ADA • Select or write test based on KSAs* • You may select an off-the-shelf test or • Write/construct one • Select or develop measures for job performance *Learn this point for the exam (first four steps are exactly the same as for a concurrent validity study)
SO18B: Steps for a predictive validity study • Administer the test to job applicants and select randomly or using the existing selection system • Do NOT use the test scores to hire applicants (I’ll come back to this later) • After a suitable time period, 6-12 months, collect job performance measures • Correlate the test scores with the performance measures • (SO18B: add this step) Determine whether the correlation is statistically significant and if it is, your test is valid
SO19: Two practical (not professional) weaknesses of predictive validity • Time it takes to validate the test • Need appropriate time interval after applicants are hired before collecting job performance measures • If the organization only hires a few applicants per month, it may take months or even a year or more to obtain a large enough sample to conduct a predictive validity study (N=50-100)
SO19: Two practical (not professional) weaknesses of predictive validity • Very, very difficult to get managers to ignore the test data (politically very difficult) • Next to impossible to get an organization to randomly hire - some poor employees ARE going to be hired • Also difficult to convince them to hire using the existing selection system without using the test score (but much easier than getting them to randomly hire and doable) (I don’t blame them; it would be like us randomly accepting students into the graduate program)
SO20A&B: Predictive validity designs • Figure 5.5 lists 5 types of predictive validity designs • Follow-up: Random selection (pure predictive validity) • Best design • No problems whatsoever from a measurement perspective; completely uncontaminated from a professional perspective • Follow-up: Use present system to select • OK and more practical, but • It will underestimate validity if your current selection system is valid; and the more valid it is the more it will underestimate the validity of your test • And, why will it underestimate the validity? (answer not on slide)
SO20C: Predictive validity, selection by scores • Select by test score: Do NOT do this!!! • Professional reason: • If your selection procedure is job related, it will greatly underestimate your validity - and, the more job related the selection procedure is, the greater it will underestimate validity. • In fact, you are likely to conclude that your test is not valid when in fact it is • Why? If your test is valid, you are severely restricting the range on both your test and your job performance measures! (professional and legal reasons not to do this)
SO20C: Predictive validity, selection by scores • Legal reason: • If adverse impact occurs you open yourself up to an unfair discrimination law suit • You have adverse impact, but you do not know whether the test is job related There is a caveat (nfe): Some courts have ruled that adverse impact is OK if a validation study is in progress. However, I see this as being way too risky legally (particularly given the technical problems with this method).
SO20: NFE, Further explanation of types of predictive validity studies • Hire, then test and later correlate test scores and job performance measures • If you randomly hire, this is no different than pure predictive validity: #1 previously, Follow-up: Random selection • If you hire based on current selection system, this is no different than #2 previously, Follow-up: Select based on current system (one more slide on this)
SO20: NFE, Further explanation of types of predictive validity studies • Personnel file research - applicants are hired and their personnel records contain test scores or other information that could be used as a predictor (i.e., perhaps from a formal training program). At a later date, job performance scores are obtained.
For exam:Rank order of criterion-related validity studies in terms of professional measurement standards 1. Predictive validity (pure) - randomly hire 2.5 Predictive validity – use current selection system 2.5 Concurrent validity 4. Predictive validity – use test scores to hire
Which is better: Predictive vs. concurrent, research results (NFE) • Data that exist suggest that: • Concurrent validity is just as good as predictive validity for ability tests (most data) • May not be true for other types of tests such as personality and integrity tests • Studies have shown differences between the two for these type of tests - so proceed with caution! • Perhaps not too surprising – as discussed earlier, applicants may falsify their answers more to look better than current employees (Conceptually, predictive validity is better, it has more fidelity with, is more similar to the actual selection procedure; test applicants, select, and see how well they do on the job later)
SO21: Sample size needed for a criterion-related validity study (review) • Large samples are necessary • The text indicates that frequently over several hundred employees are often necessary • Dickinson maintains that a sample of 50-100 is usually adequate - learn Dickinson’s number • What do companies do if they do not have that many employees? • They use content validity • They could possibly also use validity generalization, but even though this would be professionally acceptable, at the current time it is still legally risky
SO23: NFE, Construct validity • Every selection textbook covers construct validity • I am not covering it for reasons indicated in the SOs, but will talk about it at the end of class if I have time • Basic reason for not covering it is that while construct validity is highly relevant for test construction, very, very few organizations use this approach - it’s too time consuming and expensive • First, the organization develops a test and determines whether it is really measuring what it is supposed to be measuring • Then, they determine whether the test is job related
SO27: Validity generalization, what it is • Validity generalization is considered to be a form of criterion-related validity, but you don’t have to conduct a “local” validity study, that is, you don’t have to conduct a validity study in your organization using your employees • Rather you take validity data from other organizations for the same or very similar positions and use those data to justify the use of the selection test(s) • Common jobs: computer programmers and systems analysts, set-up mechanics, clerk typists, sales representative, etc. (I am skipping to SO27 for the moment, SOs24-26 relate to statistical concepts about correlation; organization of this chapter Is just awkward. I want to present all of the validity procedures together, and then compare them with respect to when you should/can use one or the other. Then, I’ll return to SOs 24-26: cont on next slide)
SO27: Validity generalization, what it is • Assumption is that those data will generalize to your position and organization • Thus, you can use this approach if you have a very small number of employees and/or applicants* *Note this point well