1 / 15

Reliability & Validity

Reliability & Validity. What is Reliability?. Reliability: Consistency and dependability. If a measurement device or procedure consistently assigns the same score to individuals or objects with equal values, the device is considered reliable.

gitano
Download Presentation

Reliability & Validity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability & Validity

  2. What is Reliability? • Reliability: Consistency and dependability. • If a measurement device or procedure consistently assigns the same score to individuals or objects with equal values, the device is considered reliable. • Researchers must establish the reliability of their measurement devices in order to be certain that they are obtaining a systematic and consistent record of the variation in X and Y.

  3. Types of Reliability • Several types: • Test-retest reliability and alternate reliability • Inter-item reliability and internal consistency • Split-half reliability • Inter-rater reliability • Scorer reliability

  4. Test-retest Reliability Measure the scores twice with the same instrument. Reliable measures should produce very similar scores. Examples: IQ tests typically show high test-retest reliability. The reliability of a bathroom scale can be tested by recording your weight 2-3 times within a minute or two.

  5. Alternate Forms Reliability Test-retest procedures may not be useful when participants may be able to recall their previous responses and simply repeat them upon retesting. In cases where administering the exact same test will not necessarily be a good test of reliability, we may use alternate forms reliability. As the name implies, two or more versions of the test are constructed that are equivalent in content and level of difficulty. Professors use this technique to create makeup or replacement exams because students may already know the questions from the earlier exam.

  6. Inter-item reliability • Inter-item reliability: The degree to which different items measuring the same variable attain consistent results. • Scores on different items designed to measure the same construct should be highly correlated. It also goes by the name internal consistency. • Example: Math tests often ask you to solve several examples of the same type of problem. Your scores on these questions will normally represent your ability to solve this type of problem, and the test would have high inter-item reliability.

  7. Inter-rater reliability • When observers must use their own judgment to interpret the events they are interpreting (including live or videotaped behaviors and written answers to open-ended interview questions), scorer reliability must be measured. • Have different observers take measurements of the same responses; the agreement between their measurements is called inter-rater reliability. • Their results can be compared statistically and represent the scorer’s reliability.

  8. A measure is valid if it measures what it is supposed to measure, and does so cleanly – without accidentally including other factors. • Most experiments are designed to measure hypothetical constructs such as intelligence, learning, or love. The experimenter must create an operational definition of the dependent variable because one cannot measure these hypothetical constructs directly. • A valid measure is one that measures this hypothetical construct accurately (such as intelligence) without being influenced by other factors (such as motivation).

  9. Types of Validity Validity: (actually studying the variables that we wish to study) • Construct validity • Face validity • Content validity • Criterion validity -- 2 types: – Predictive validity – Concurrent validity

  10. Construct Validity • Do my dependent variables actually measure the hypothetical construct that I want to test? • Does my IQ test really measure IQ, and nothing else? • Do my procedures actually measure learning, (without being influenced by motivation)? • Does my personality test really measure personality traits without including fatigue?

  11. Face Validity • The consensus (usually by experts in the field) that a measure represents a particular concept. It is the least stringent type of validity. Because most psychological variables require indirect measures (like the intelligence example before), the validity of a measured definition may not be self-evident. • Does rate of eating really reflect hunger? In rats, does the rate of lever pressing actually measure learning? • Does talking measure extroversion? • Does GPA or SAT score • really reflect intelligence?

  12. Comparing face validity withconstruct validity • Face validity: The consensus that a measure represents a particular concept – the face value of the measure. (Would a 130-pound 5’3” college student be a good football or basketball player?) • Construct validity: The accuracy with which a measure represents the particular concept, without influence of additional factors. Construct validity implies that other operational definitions of the same construct will yield correlated results.

  13. Content Validity • Does the content of our measure fairly reflect the content of the thing we are measuring? • Example: Do the questions on an exam accurately reflect what you have learned in the course, or were • the exam questions sampled from only a sub-section of the material? • A test to measure your knowledge of mathematics should not be limited to addition problems, nor should it include questions about French literature. • It should cover the entire range appropriate math problems you are trying to measure.

  14. Criterion Validity • A powerful indicator of the validity of a measure is its ability to accurately predict performance on other, independent outcome measures (referred to as criterion measures). • The extent to which your SAT score predicts your college GPA is an indication of the SAT’s criterion validity. • There are two approaches to criterion validity: Concurrent validity and Predictive validity.

  15. Concurrent vs. Predictive Validity • In concurrent validity, the SAT test scores and criterion measures (high school GPA) are obtained at roughly the same time (concurrent). • If the SAT shows high concurrent validity, it will be highly correlated with GPA obtained at the same time the SAT is taken. • Predictive validity, however, would be high if your SAT score accurately predicted your college GPA, which is obtained long after taking the SAT.

More Related