1 / 27

PhD Research Seminar Series: Reliability and Validity in Tests and Measures

PhD Research Seminar Series: Reliability and Validity in Tests and Measures. Dr. K. A. Korb University of Jos. Outline. Reliability Theory of Reliability Split-Half Reliability Test-Retest Reliability Alternate Forms Reliability Inter-Rater Reliability Validity Construct Validity

amber
Download Presentation

PhD Research Seminar Series: Reliability and Validity in Tests and Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhD Research Seminar Series:Reliability and Validity in Tests and Measures Dr. K. A. Korb University of Jos

  2. Outline • Reliability • Theory of Reliability • Split-Half Reliability • Test-Retest Reliability • Alternate Forms Reliability • Inter-Rater Reliability • Validity • Construct Validity • Criterion Validity • Content Validity • Face Validity Dr. K. A. Korb University of Jos

  3. Overview • Test Developer: The person who created a test • Test user: A person administering the test • Test taker: Person taking the test Dr. K. A. Korb University of Jos

  4. Reliability: Consistency of results Reliable Reliable Unreliable Dr. K. A. Korb University of Jos

  5. Reliability Theory • Actual score on test = True score + Error • True Score: Hypothetical actual score on test • The reliability coefficient indicates the ratio between the true score variance on the test and the total variance • In other words, as the error in testing decreases, the reliability increases Dr. K. A. Korb University of Jos

  6. Reliability: Sources of Error • Error in Test construction • Error in Item Sampling: Results from items that measure more than one construct in the same test • For example: A test that has items assessing both reading and math ability will have lower reliability than a test that assess just reading • Error in Test Administration • Test environment: Room temperature, amount of light, noise, etc. • Test-taker variables: Illness, amount of sleep, test anxiety, etc. • Examiner-related variables: Absence of examiner, examiner’s demeanor, etc. • Error in Test Scoring • Scorer: With subjectively marked assessments, different scorers may give different scores to the same responses Dr. K. A. Korb University of Jos

  7. Reliability: Error due to Test Construction • Measured by Split-Half Reliability: Determines how consistently your measure assesses the construct of interest. • A low split-half reliability indicates poor test construction. • If your measure assesses multiple constructs, split-half reliability will be considerably lower. • Separate the constructs that you are measuring into different sections of the questionnaire and calculate the reliability separately for each construct. • If you get a low reliability coefficient, then your measure is probably measuring more constructs than it is designed to measure. • Revise your measure to focus more directly on the construct of interest. • When validating a measure, you will most likely calculate the split-half reliability of your instrument. Dr. K. A. Korb University of Jos

  8. Reliability: Error due to Test Construction • Calculating Split-Half Reliability • If you have dichotomous items (e.g., right-wrong answers) as you would with multiple choice exams, calculate the KR-20. • If you have a Likert scale, essays, or other types of items, use the Spearman-Brown formula. • For a step-by-step example of calculating the Split-Half Reliability, see the associated presentation entitled Calculating Reliability of Quantitative Measures. Dr. K. A. Korb University of Jos

  9. Reliability: Error due to Test Administration • Test-Retest Reliability: Determines how much error in a test score is due to problems with test administration. • To calculate: • Administer the same test to the same participants on two different occasions, perhaps a week or two apart. • Correlate the test scores of the two administrations of the same test using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

  10. Reliability: Error due to Test Construction with Two Forms of the Same Measure • Parallel Forms Reliability: Determines the similarity of two different versions of the same measure. • To calculate • Administer the two tests to the same participants within a short period of time. • Correlate the test scores of the two tests using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

  11. Reliability: Error due to Test Scoring • Inter-Rater Reliability: Determines how closely two different raters mark the assessment. • To calculate • Give the exact same test results from one test administration to two different raters. • Correlate the two markings from the different raters using Pearson’s Product Moment Correlation Dr. K. A. Korb University of Jos

  12. Validity: Measuring what is supposed to be measured Valid Invalid Invalid Dr. K. A. Korb University of Jos

  13. Validity • Three types of validity: • Construct validity: Measure the appropriate psychological construct • Criterion validity: Predict appropriate outcomes • Content validity: Adequate sample of content • Each type of validity should be established for all psychological tests. Dr. K. A. Korb University of Jos

  14. Construct Validity • Definition: Appropriateness of inferences drawn from test scores regarding an individual’s status of the psychological construct of interest • For example, a test is developed to measure Reading Ability. Once the test is administered to students, does their score on the test accurately reflect their true reading ability? • Two considerations: • Construct underrepresentation • Construct irrelevant variance Dr. K. A. Korb University of Jos

  15. Construct Validity • Construct underrepresentation: A test does not measure all of the important aspects of the construct. • For example, a test of academic self efficacy (perceived effectiveness in academics) might measure self efficacy only in math and science, thus ignoring other important academic subjects. • Construct-irrelevant variance: Test scores are affected by other unrelated processes • For example, a test of statistical knowledge that requires complex calculations is likely influenced by construct-irrelevant variance. In addition to measuring statistical knowledge, the test is also measuring calculation ability. Dr. K. A. Korb University of Jos

  16. Sources of Construct Validity Evidence • Homogeneity: The test measures a single construct • Evidence: High internal consistency - calculated by Split-Half reliability • Convergence: Test is related to other measures of the same construct and related constructs • Evidence: Highly correlations with other measures – same as Criterion Validity • Theory: The test behaves according to theoretical propositions about the construct • Evidence by changes in test scores according to age: Scores on the measure should change by age as predicted by theory. • For example, intelligence scores of one person should increase as that person gets older because theories of intelligence dictate increases by age. • Evidence from treatments: Scores on the measure change as predicted by theory from a treatment between pretest and posttest. • For example, scores on a test of Knowledge of Nigerian Government should significantly increase after a course on the Nigerian Government. Dr. K. A. Korb University of Jos

  17. Criterion Validity • Definition: Correlation between the measure and a criterion. • Criterion: Other accepted measures of the construct or measures of other constructs similar in nature. • A criterion can consist of any standard with which your test should be related • Examples: • Behavior (e.g., misbehavior in class, teacher’s interactions with students, days absent from school) • Other test scores (e.g., standardized test scores) • Ratings (e.g., teachers ratings of helpfulness) • Psychiatric diagnosis (e.g., depression, schizophrenia) Dr. K. A. Korb University of Jos

  18. Criterion Validity • Three types: • Convergent validity: High correlations with measures of similar constructs taken at the same time. • Divergent validity: Low correlations with measures of different constructs taken at the same time. • Predictive validity: High correlation with a criterion in the future Dr. K. A. Korb University of Jos

  19. Criterion Validity • Example: You developed an essay test of science reasoning to admit students into the science programme at the university. • Convergent Validity: Your test should have high correlations with other science tests, particularly well established science tests. • Divergent Validity: Your test should have low correlations with measures of writing ability because your test should only measure science reasoning, not writing ability. • Predictive Validity: Your test should have high correlations with future grades in science courses because the purpose of the test is to determine who will do well in the science programme at the university. Dr. K. A. Korb University of Jos

  20. Criterion Validity Example High correlations with other measures of science ability indicates good criterion validity. Low correlations with measures unrelated to science ability indicates good criterion validity. High correlation with future measures of science ability indicates good criterion validity. Dr. K. A. Korb University of Jos

  21. Content Validity • Definition: Sampling the entire domain of the construct it was designed to measure • For example: • The first chart represents the amount of time in class spent on each maths topic • The second chart represents the amount of test questions on each maths topic • This test does NOT demonstrate content validity because the proportion of test questions does not match the proportion of coverage in class. Dr. K. A. Korb University of Jos

  22. Content Validity • For academic tests, a test is considered content valid when the proportion of material covered by a test approximates the proportion of material covered in a class. • This maths test demonstrates good content validity because the proportion of test questions on each topic matches the proportion of time spent in class on each topic.

  23. Content Validity • Content validity tends to be an important consideration ONLY for achievement tests • To assess: • Gather a panel of judges • Give the judges a table of specifications of the amount of content covered in the domain • Give the judges the measure • Judges draw a conclusion as to whether the proportion of content covered on the test matches the proportion of content in the domain. Dr. K. A. Korb University of Jos

  24. Face Validity • Face validity addresses whether the test appears to measure what it purports to measure. • To assess: Ask test users and test takers to evaluate whether the test appears to measure the construct of interest. • Face validity is rarely of interest to test developers and test users. • The only instance where face validity is of interest is to instill confidence in test takers that the test is worthwhile. • Face validity is NOT a consideration for educational researchers. • Face validity CANNOT be used to determine the actual interpretive validity of a test. Dr. K. A. Korb University of Jos

  25. Concluding Advice • The best way to determine that the measures you use are both reliable and valid is to use a measure that another researcher has developed and validated • This will assist you in three ways: • You can confidently report that you have accurately measured the variables you are studying. • By using a measure that has been used before, your study is intimately tied to previous research that has been conducted in your field, an important consideration in determining the importance of your study. • It saves you time and energy in developing your measure Dr. K. A. Korb University of Jos

  26. Finding Pre-Existing Measures • Information on how to find pre-existing measures: • http://www.apa.org/science/faq-findtests.html#printeddirec • Online directory of pre-existing measures • http://www.ets.org/testcoll/ • Type the construct you want to measure in the empty box and click the Search button. • Find the test that is most relevant to for your purposes. • When you click on the measure name in blue, if it has a journal article listed in the Availability category, the measure will be published in that journal article. • Some tests can also be ordered from the ETS Tests collection for about N3000 and then downloaded to your computer. • You can also try googling the name of the test to determine if somebody else has published the measure on the internet. Dr. K. A. Korb University of Jos

  27. Websites for Pre-existing Measures • Personality Variables: International Personality Item Pool • http://ipip.ori.org/ipip/ • Motivation Constructs: Self Determination Theory • http://www.psych.rochester.edu/SDT/ • Motivation Constructs: Students’ goal orientations: • http://www.umich.edu/~pals/ Dr. K. A. Korb University of Jos

More Related