5. Evaluation of measuring tools: reliability

5. Evaluation of measuring tools: reliability Psychometrics Group A (English)

After we have evaluated the quality of test items and eliminated those that are not considered adequate, we must evaluate the overall quality of the test. • In this chapter we discuss the problem of the reliability and accuracy of the measure, trying to find an answer to the question “to what extent the scores obtained by subjects in the test are affected by measurement errors and how much”.

The problem of measurement error

Measurement error is the difference between the empirical score obtained by a subject on a test and his/her true score. • Objective: • Elaborate tests that lead to the minimum possible measurement error. • That the obtained score gives the greatest degree of real information on the characteristics under study. • There are other errors, random ones (which ones are studied trough analysis of reliability).

Types of measurement errors

Measurement error: the difference between the empirical score of a subject and their true score. • We obtain an individual measure of the accuracy of the test. • The standard error of measurement: standard deviation of measurement errors. It’s a measurement of the group because it is calculated for all subjects of the sample. • Estimation error of the true score: the difference between the true score of the subject and the true score predicted by the regression model. • The standard error of estimation of the true score: standard deviation of estimation errors.

Substitution error: the difference between the score obtained by a subject in a test and that one obtained in another parallel test. It would be committed to replacing the test scores on the X1 by those from a parallel test X2. • The standard error of substitution: standard deviation of substitution errors. • Prediction error: the difference between the scores obtained by a subject in a test (X1) and predicted scores in the same test (X1') from a parallel test X2. • The standard error of prediction = standard deviation of prediction errors.

The linear model of Spearman

He's going to help us estimate the amount of error that are affecting to the empirical scores and the true level of subjects in the characteristic of study. • X (empirical score)= V (true level)+ E (measurement error)

A) E = X – V • B) E (e) = 0 • C) • D) Cov (V, E) = 0 • E) • F) Cov (X, V) = • G) • H)

Interpretation of the reliability coefficient

The correlation between the empirical scores obtained by a sample of subjects in two parallel forms of the test. • The ratio between the variance of true scores and the variance of empirical scores. • As this ratio increases, the measurement error decreases. • Reliability index:

Factors that affect reliability

TEST LENGTH • If we increase the length of the test (if we add parallel items): • More information about the attribute under study. • Lower error when estimating the true score of a subject. • So, reliability will increase.

SAMPLING VARIABILITY • The reliability coefficient can vary depending on the homogeneity of the group. • The lower the reliability coefficient the more homogeneous the group. • * We assume that the standard error of measurement of a test remains constant independently of the variability of the group in which it is applied.

Reliability as equivalence and stability of measures Coefficient of reliability or equivalence

A test must meet two requirements : • It should measure the characteristic that really needs to be measured (be valid). • Empirical scores obtained by applying the test should be: • Accurate (free of error), and • Stable (when we evaluate a trait or characteristic with the same test at different times and under conditions as similar as possible, if the studied trait has not changed, you must obtain similar results: reliability of the test).

a) Parallel forms method • 1. Elaborate two parallel forms of one test X and X’. • 2. Apply the two tests on a sample of subjects representative of the population targeted by the test. • 3. Calculate Pearson’s correlation. • X1 and X2:scores obtained by subjects in each form of the test. • If applications are made at the same time there is greater control over the conditions of application. Difficulty to elaborate two parallel forms.

b) Test-retest method • 1. Apply the same test on two separate occasions to the same sample of subjects. • 2. Calculate the correlation • X1 and X2: scores obtained by subjects in each of the test applications. • It does not require different forms of the same test. Possible influence of memory, the time interval between one application and another, and the attitude of the subject.

Reliability as internal consistency

Methods to estimate the reliability of a test that only require one application: • A) Based on the division of the test in two parts: • Spearman-Brown • Rulon • Guttman-Flanagan • B) Basedonthecovariation of items: • Cronbach'salphacoefficient

a) Methods Based on the division of the test in two parts • The estimation of reliability is not affected by the factors discussed. Save time and effort. • 1. Apply the test to a sample of subjects. • 2. Once obtained the scores, divide the test in two parts, calculate the correlation between the scores obtained by subjects in both parts and apply a correction formula. • The parts should be similar in difficulty and content.

Spearman-Brown • The two parts must be parallel, so we should check the assumptions of parallelism (true scores of the subjects are the same in both tests, the variance of measurement errors is the same in both tests).

Spearman-Brown. Example • We have applied a numerical aptitude test of 20 items to a sample of 6 subjects. The table results are the scores obtained on even items (X1) and odd ones (X2). Calculate the reliability coefficient assuming that the two parts of the test are parallel.

Spearman-Brown. Example

Rulon and Guttman-Flanagan • They are applied when, despite not being strictly parallel parts, we can consider tau-equivalent (test in which the true scores of subjects of the sample are the same in both forms but the error variances are not necessarily equal) or essentially tau-equivalent (test in which the true score for each subject in one of the test is equal to the other plus a constant).

Rulon y Guttman-Flanagan • Rulon: • Guttman-Flanagan:

RELIABILITY AS INTERNAL CONSISTENCY.Test divided in two parts. Rulon In the following table we can find scores of participants in even and odd items in a test ObtainthereliabilitycoefficientusingRulon and Guttman-FlanaganMethods

RELIABILITY AS INTERNAL CONSISTENCY.Test divided in two parts. Rulon In the following table we can find scores of participants in even and odd items in a test • Calculate differences variance. (1-14) 2. Then we apply Rulon formula to obtain the reliability coefficient (6.1)

RELIABILITY AS INTERNAL CONSISTENCY.Test divided in two parts. Guttman-Flanagan We obtain the same result without differential scores.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. • Methods based on items covariation : • There are different ways to divide test in two parts. Each one presents its particular reliability coefficient. This problem can be solved analyzing items covariations. • Methods: • Cronbach's alpha coefficient, and its derivations: KR20 y KR21 de Kuder-Richardson (1937).

b) Method based on the covariation of items • It requires the analysis of variance and covariance of the subjects' responses to the items. • It is an estimation of the internal consistency of test’s items. • Cronbach's alpha coefficient. • It is based on the mean correlation among all test’s items.

Cronbach's alpha. Example • We have applied a test of visual perception to 6 subjects. The results of the table show the scores of subjects in each of the five test items. Calculate the value of the coefficient of reliability of the test.

Cronbach's alpha. Example

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficientUnbiased estimator Unbiased estimator: alpha is only an estimate or approximation to the actual value of the reliability coefficient. However, there is a more accurate approximation of the value expressed by the following formula : In practice, from 100 subjects the differences are not significant.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficientUnbiased estimator Example 1: In a sample of 150 participants a test presents a value of α = 0.75. What is the value of alpha unbiased estimator? Example 2: In a sample of 20 participants a test presents a value of α = 0.75. What is the value of alpha unbiased estimator? Which one presents the highest difference between alpha and the unbiased alpha? Why does it occur?

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficientUnbiased estimator Example 1. Example 2. Alpha and its unbiased estimator present a higher difference in example 2. (0,75 vs 0,78) because sample size is smaller.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences Alpha provides an estimate of the reliability coefficient of a test from the applied sample, but sometimes we can be interested in: • Can alpha have a particular value in the population from the sample value obtained? • Is there a significant difference between the alpha value of two independent samples? • Is there a significant difference between two alpha values for the same sample? • DEVELOPMENT OF THE SAMPLE COEFFICIENT ALPHA THEORY(Feldt, 1965; Kristof,1963)

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences • Can alpha have a particular value in the population from the sample value obtained? • Kristof (1963) and Feldt (1965) propose the following statistic based on the F distribution.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences After applying a test of spatial perception of 35 items on a sample of 60 students, a α of 0.83 was obtained. Is this coefficient statistically significant? (confidence level: 95%).

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences 5,88 > 1,47 - The null hypothesis is rejected. The alpha coefficient is statistically significant

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences • Is there a significant difference between the alpha value of two independent samples? • Feldt (1969) proposed the statistic W based on the F distribution with(N1-1; andN2 –1; degrees of freedom) that lets test • H0: 1= 2

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences We applied a reasoning test a sample of 121 participants, obtaining an alpha value equal to 0.55. The same test was applied to another sample of 61 participants obtaining an alpha value equal to 0.62. Are there significant differences between the values of both coefficients? (N.C. = 95%).

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences 1,18 < 1,7 - The null hypothesis is accepted. The difference between the two coefficients is not statistically significant.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences Is there a significant difference between two alpha values for the same sample? Is there a significant difference between two alpha value from dependent samples? Feldt (1969), proposed the statistic t based on the t distribution with (N-2) d.f.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences We apply two tests of visual perception to a sample of 125 participants. The correlation between the scores of both tests is 0.7. Alpha coefficient values were, respectively, 0.75 and 0.84. The difference between these values is statistically significant? (N.C. = 95%).

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient. Inferences 3,5 > 1,98 - The null hypothesis is rejected. The difference between coefficients is statistically significant.

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient.Particular cases Particular cases of Alpha: formulas of Kuder-Richardson (1937) refers to the estimation of the reliability of a test in case of dichotomous items  the variance is determined by : Where ph is the correct answers proportion, and qh is the error one. In this case alpha can be defined by KR20, or KR21 (when the items have equal difficulty, that implies, the same proportion of correct answers) Where; n= number of items p= correct answers proportion q= wrong answers proportion S2x= total variance

RELIABILITY AS INTERNAL CONSISTENCY. Methods based on items covariation. Cronbach's alpha coefficient.Particular cases Suppose a 6 items test, and 6 subjects answers 2. Then we use KR20. The value is.82 1. First you have to calculate the variance of the items, as being dichotomous it is: p*q.

COEFFICIENTS BASED ON ITEMS FACTOR ANALYSIS Carmines Theta () and Omega () coefficients are two indicators of internal consistency that are based on items Factor Analysis. Usually, for the same dat set it can be verified that 

5. Evaluation of measuring tools: reliability