1 / 48

PSYCHOMETRICS

PSYCHOMETRICS. Unit 7 Baremation, standarization and matching scores. Salvador Chacón Moscoso Susana Sanduvete Chaves. 1. Scores assignment 2. Stablishment of cutpoints in the test referred to the criterion. 21. Centerred in the test procedures. 2.2. Centerred in the people procedure.

hillheather
Download Presentation

PSYCHOMETRICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSYCHOMETRICS Unit 7 Baremation, standarization and matching scores Salvador Chacón Moscoso Susana Sanduvete Chaves Agradecemos a Francisco Pablo Holgado Tello su inestimable colaboración en la elaboración de este material

  2. 1. Scores assignment 2. Stablishment of cutpoints in the test referred to the criterion. 21. Centerred in the test procedures. 2.2. Centerred in the people procedure. 2.3. Compromise procedure. 3. Transforming scores into normative tests. 3.1. Lineal. 3.2. Non lineal. 3.3. Cronological norms. 3.4. Stablishment and type of norms. 4. Scores equalization. 4.1. Definition of equalizationa dn related terms. 4.2. Equialization designs. 4.3. Equalization methods. 5. Elaborating the documents taht accompany the test. 5.1. Test manual. 6. Assessment of the CTT. 7. Bibliography.

  3. SCORES ASSIGNMENT Once administered the set of items , the scores are combined to obtain one that reflects the subject's position in the test. Test scores formed by dichotomous items , we found several situations to consider to allocate the total score : The subject knows the correct answer. The subject does not know the correct answer: b.1- omitted item b.2- answers incorrectly b.3- choose the correct answer by chance.

  4. 1. Ratings of tests consist of items without options ( complete , short answer , ...): Add the scores of the items ( 1 hits , 0 fails) . Where ; Xa is the total score is subject to and xai examined the response to item i . All items are given the same weight and not the items are scored blank or omitted

  5. 2. Ratings of tests with multiple choice items or VF: existence of individual differences in the tendency to ignore or respond to items than the response  additional scores observed variation outside the trait measured not know the test.

  6. 2.1 . Correction penalizing errors : Where: A is the number of hits ; E is the number of errors ; and k the number of alternatives of the item.

  7. 2.2 . Correction reclaiming omissions : the subjects attributed to a number of additional hits that would have had if contestase random items that did not respond. Where: B is the number of items left blank ; and k the number of alternatives of the item.

  8. Empirical studies show that subjects do not behave according to the assumptions of correction formulas , ie , subjects who do not know the item , usually do not respond totally random : Not take into account partial knowledge of the subjects on some items . Individual differences in risk-taking behavior when responding to the items .

  9. Créditos para el conocimiento parcial Therefore , it should be noted that among the subjects who have the same score on a multiple choice test , there may be varying degrees of knowledge. To this is called : credits partial knowledge . Crocker and Algina (1986 ) propose different methods to control it: Weighting of confidence. Pick up the correct answer. Weighting options.

  10. 1. Weighting of trust: 2. To correct answer : 3. Weighting of the options Choose the option to create more accurate and assign a value depending on the degree of correction attributed to its respuesta subjects who choose the same answer can get different score depending on the degree of trust that have been attributed . An option is selected , and feedback received . If the answer is correct moves to the next item ; if it is wrong to do a new elección. to score the test , the total number of responses made ​​by the subject is subtracted . Options vary in the degree of correction ; subjects who have chosen the correct level of examinees who choose less correct .  to determine the weights often used expert judgment . In general, these methods have shown improvements over the validity of the tests

  11. INTERPRETACIÓN DE PUNTUACIONES. TRN Y TRC A company wants to promote a job . Apply a test of 70 multiple choice items . An employee gets 40 points. With these data we might ask : 1. We could say that the performance of the subject would be suitable for this position , or more competent individuals ? 2. Should an intensive course of adaptation to the new position? If we only focus on who has obtained 40 points of 70 , we could not answer any question . To do this: Select a representative sample, and then perform a frequency distribution , and check the place our subject within their group and determine if more competent subjects . Establishing criteria to discern whether the subject is above and therefore ranks , or if below and therefore has to go to training. Score of the subject is the same in both cases ( 40 ) . However , our interpretation to answer both questions is very different ; in a case is referred to NORMA ; and the other referred to CRITERION

  12. ESTABLECIMIENTO DEL PUNTO DE CORTE EN LOS TRC

  13. TEST REFERRED TO THE CRITERION One of the main functions of the test is to provide data for decision- decisiones property POINT CUT to decide on the performance of the subjects. Cutoff or standard : a point scale used to classify subjects into two categories which represent different levels of competence in relation to a domain. 1. Methods centric test: expert judgments about test items . 2. Methods focused on people : the judges opinions on the competence of individuals . 3. Procedures Engagement : combines absolute criteria ( as above ), with criteria .

  14. Focused on the test procedures. Method Nedelsky Centered methods test: 1.1 . Method Nedelsky (1954 ) is a widely used method in minimum competency test . Identify a population of judges and select a sample . Each judge must define a minimally competent subject and predict the behavior of the examined each of the different options indicating which items eliminated. For each item , the judge recorded the reciprocal of the number of alternatives that remain. For example , an item of three alternatives , if one has been removed , the converse would be 1/2 . For each judge, the reciprocals of all elements of the test are summed to yield the expected value of the test for each judge. The values ​​obtained from all judges are averaged , and this value is taken as the initial value of the standard. Assume examined minimally competent randomly selected from alternatives that can not immediately be identified as incorrect. What is questionable

  15. Focused on the test procedures. Angoff Method Centered methods test: 1.2 . Angoff Method ( 1971) introduces a variation method Nedelsky . Identify a population of judges and select a sample . Each judge must define what it means for him the minimal competition ; and reach a consensus with the other judges. Considering each test item and decide for each the probability that a minimally competent examined item correctly respond to ( a- priori estimate of the difficulty ) For the cutoff all probabilities are summed and averaged to all judges. It is the preferred method , researched and recommended

  16. Focused on the test procedures. Ebel method Centered methods test: 1.3 . Methods Ebel (1972 ) proposed a procedure similar to Angoff , but considering the importance (essential , important, acceptable , and questionable) ; and the level of difficulty ( easy, medium, hard ) of the item . A two-dimensional table in which each item is categorized originates. Identify a population of judges and select a sample . Classify each of the test items in the box ; and count the number of item in each box. Each judge assigns a percentage representing each box of items that could be answered correctly by a minimally competent subject.

  17. Centered methods test: 1.3 . Methods Ebel (1972 ) Relevance (essential , important, acceptable , and questionable) ; difficulty level ( easy, medium, hard ) of the item . A two-dimensional table in which each item is categorized originates. The cutoff , according to the following formula is determined :

  18. People-centered procedures. Limit method group. People-centered methods Limit 2.1.Method group : when there is clear evidence of what are considered suitable and non- suitable , test and cutoff can be assessed . Zieky and Livingston (1977 ) : Identify a population of judges and select a sample . It is essential to be able to judge the performance level of the subjects of the test scores . Judges to define three categories is asked : Competent ; limit ; and inadequate or incompetent. Judges evaluate examinees , and other information based on the group qualify for the "limits" . After assigning subjects to groups, the test is applied. And the median of the scores obtained by the subjects of the category " limit " is calculated; and that value is taken as standard or cutoff. Stresses its simplicity. The main criticism is the ability attributed to the judges to evaluate subjects .

  19. People-centered procedures. Method contrast groups . People-centered methods 2.2 . Method contrast groups ( Zieky and Livingston, 1977 ) : The judgments are based on the performance of the subjects examined . ; b) ; c ) The first three phases are the same as in the borderline group method ( Selection of Judges , which define the categories , and rank the subjects). Subjects perform the test ; and standards are established based on performance of " competent " and " incompetent " . It is stable as a cutoff , the score that best discriminates between the two groups

  20. Estándar Puntuaciones en el test People-centered methods 2.2 . Method contrast groups ( Zieky and Livingston, 1977 ) : The judgments are based on the performance of the subjects examined . Sujetos F.Neg. F.Post. The cutoff is given by the intersection between the two distributions: incompetent and competent  meet both types of errors , ie fit true that fail the test; and unfit to do pass the test.

  21. Procedimientos de compromiso 3. Procedures Engagement : The above methods were based on absolute standards , since judges are the ones that establish a minimum to pass , regardless of what the group. These combine information from both absolute and relative and try to reach a compromise combining both types of data. 3.1 . Method Beuk (1984 ) : Judges respond to two questions: a) minimum percentage of test items that people must respond correctly to overcome  absolute data . b ) percentage of people who exceed the prueba data . - Finally, the empirical test data a cutoff of commitment is established.

  22. Procedimientos de compromiso. Método de Beuk • The percentage of items that need to correctly answer ( first question ) on the abscissa ; : two axes are plotted and ordered the subjects to obtain the minimum score to pass the test . 2. The average response of judges to both questions is calculated and obtained point A. 3. the empirical distribution of scores in the test subjects was obtained 100 4. A` point is obtained by passing a straight slope A Sy / Sx ( responses from judges to both questions ) . A` 5. For the cutoff , A` is projected on the abscissa . Porcentaje de personas A Distribución empírica Porcentaje de ítems 100

  23. TRANSFORMATION OF SCORES FOR TEST INTERPRETATION IN POLICY TRANSFORMATION OF SCORES IN TRN : The interpretation of scores make sense when compared to the scores obtained by the other subjects in the sample . Transformation test scores POLICY : Once obtained scores of subjects in a test , to facilitate understanding , raw scores often become other scores. Objective: To express the raw scores so that allude to the location of the subject in the group. - Linear Transformations : typical scales ; Typical derivatives . - Nonlinear transformations: percentiles ; Typical normalized ; derived standard .

  24. Donde: es media de la muestra X es la puntuación directa Sx es la desviación típica de la distribución Lineal transformtions 1. Typical scales : once applied the test to the entire sample , the mean and standard deviation is calculated; from what the typical scores ( 0, 1 ) are obtained  Indicates the number of standard deviations to the score of the subject relative to the mean is . Represents a change in the origin of the scale (mean) ; and the unit of measure ( standard deviation ) .

  25. 2. Typical derived scales : once the transformation typical scales , they can be transformed with mean and standard deviation set by the usuario is possible to avoid negative and decimal values ​​. The transformation can be expressed as : Donde: Y = es puntuación típica derivada a = es la media de las puntuaciones en la nueva escala b = es la desviación típica de la nueva escala Zx = es la puntuación típica en la escala original

  26. Typical scales derived from extensive use are:

  27. Non-lineal transformations As shown , the linear transformations avoid the problem of negative numbers and decimals. However , the application of a test two different samples , resulting in different distributions and therefore caution is needed when comparing the scores of a subject with respect to a particular sample . For this we can use nonlinear transformations which involve alteration of the shape of the original distributions . - Percentile range . - Typical normalized . - Normalized derivative .

  28. Percentile Rank: test score that leaves below a certain percentage of cases in the normative group . They are widely used for communicating results . It constitutes an interval scale , implying that : - In different regions of the scale, a difference of 1 point equals different magnitudes . the profit or loss for a subject , as well as comparisons between subjects can not be analyzed in percentiles range. arithmetic and statistics , such as mean or group comparisons calculations should not be used.

  29. 2. Typical scales standard : obtained from percentiles . Are defined as the standard score that corresponds to an empirical score for a subject in a test assuming that the distribution is normal . Determine the corresponding percentile ranks for each of the direct scores . Assuming that the variable is normally distributed , search the table of the normal z values ​​corresponding to each of the percentages

  30. 3. Scales standardized derivatives ( estaninos or eneatipos ) : typical standardized , have the disadvantage of negative and decimal values ​​. This, can be overcome by transforming the normalized standard scores , to derived normalized . Estaninos : scale values ​​are positive integers and 9 units ( 1 to 9 ) . They are a linear transformation of the typical standardized with mean 5 and standard deviation 2 E=5+2Zn

  31. Normas cronológicas The mental age ( Binet Simon ) : - Samples of children of different ages were selected. - Perform a test ( eg , intelligence ) . - Each age group was assigned to the average obtained IQ : Mental age of the subject is calculated. It is divided by chronological age. < 100  intellectual development below that corresponds to their age. =100do not coincide >100 inteklectual debe,opment higher than cronological age.

  32. Disadvantages: The meaning of a year of mental age is not consistent with the evolutionary development of a child. As the distance grows between age and cognitive development decreases homogenized  IQ is slightly discriminating between adults for example.

  33. Establecimiento y tipo de normas ESTABLISHMENT AND TYPE OF STANDARDS : standards; They are a description of the subject's position regarding the normative group . - The normative group should be representative of the population and be sized to provide insurance estimators. - The normative group must be homogeneous ( all subjects are members of the target population) . - As normative data all descriptive statistics are presented . And as for the type of the most common scales is the percentile rank .

  34. Types of rules : National standards: is the most common type , and are based on nationally representative samples . Local Rules : are standards based on sub - populations defined on limited geographical units. Such as autonomous communities. Standards and norms user convenience : are based on the scores of individuals to whom the test is administered for a period of time standards , but without sampling considerations . They are very common in the tests used in personnel selection. Or on many other occasions are based on groups of subjects that are simply accessible to the constructor of the test.

  35. MATCHING SCORES The matching scores of two or more forms of tests relates to establish a correspondence between their scores , so that scores of any of them can be expressed in terms of the other. That is, trying to find a transformation that allows expressing test scores as a function of Y units of another test X :Y*=f(X)

  36. Equalization conditions : That the tests measure the same construct ; and with the same reliability. That for each group of identical fitness examined , the conditional distribution on the test frequencies and , after transformation (Y * ) is the same as the conditional distribution of the test frequencies on X. Invariance population ; that is, the transformation will be the same regardless of which group is obtained. Symmetry transformation is invertible , ie the results are the same obtaining Y * = f ( X); X * = f ( Y) It is unlikely that the four conditions are met. And although it is theoretically possible to construct two ways to measure the same construct and are equally reliable , is unlikely to be at all fitness levels.

  37. Horizontal Equalization , compared to vertical : horizontal equalization refers to both tests are equal in difficulty. However, this does not always happen , and we face the problem of vertical equalization , ie when the tests measure the same trait but with different levels of difficulty.

  38. Diseños de equiparación -Designs Of one group : the two forms of the test is administered , the same group of subjects ( one after another) . -Inconveniente Of differences due to fatigue , or the effect of the order . To avoid this you can use designs contrabalanceado one group to divide the sample into two sub - groups and apply both ways but in reverse order.

  39. 2. Designs equivalent groups drawn from the population and randomly two samples of subjects , and each sample was applied to a form of the test. That is, the two forms of the test are each administered to a group of subjects. advantage; the effects of fatigue , learning or order of application are avoided. Disadvantage : Both groups must be equivalent in fitness test measures . And it requires large sample sizes.

  40. 3. Designs nonequivalent groups with common items or anchor design ( it is one of the most widely used designs ) : Each sample subjects were given only one form of the test , with the peculiarity that a common test is administered in both samples ( Z , test anchor ) , that establishes the equivalence between the test equating . - Test internal anchor : Common items that appear interspersed among the other items ( considered in the total score ) : anchor items . - Test of external anchor : the common items appear at an independent test ( not considered in the total score )  anchor test . As for the number of common items ; at least 20 % of the total test .

  41. Average method : is mapped stockings to equate tests . Let X and Y be two different tests, for all rated X can establish that : • Donde: • X* es la puntuación del test Y, equivalente a una del test X. • X es la puntuación del test X. • es la media del test X e Y, respectivamente. Methods equation : once the data obtained through designs seen above, it is necessary to obtain scores using different statistical methods.

  42. 2. Linear Method : based on the equalization of those raw scores with the same standard score . That is, a given score Y is equivalent to one of X , if both Z scores have the same score , which ZX = ZY where: X * is the test score Y , equivalent to a test X. X is the test score X. is the average of test X and Y , respectively. Sx and Sy is the standard deviation of X and Y respectively

  43. If we had used a design of a single group, in which both tests are administered. The expression would be: Finally, if we had used an anchor pattern . The expression would be:

  44. 3. Método equipercentil: es el método más habitual, consiste en equiparar aquellas puntuaciones cuyos percentiles son iguales. 1. Calcular en cada test las puntuaciones percentiles que corresponden a cada una de las puntuaciones de ambos test. 2. Representar gráficamente las dos distribuciones de percentiles. 3. Obtener las puntuaciones equivalente en los dos test (X e Y) a partir del gráfico anterior.

  45. PREPARATION OF DOCUMENTS ACCOMPANYING THE TEST MANUAL OF THE TEST: There is a need for the constructor of the assay of information to users so they can give proper meaning to the scores obtained by a subject in test PROCESSING MANUAL TEST. To Yela ( 1984), the information to be included would be the same as that included in a scientific report : Specification test. Description of the test. The justification . References .

  46. Especificación del test: denominación y clasificación del test (tipo de constructo que evalúa); tipo de material (impreso y manipulativo); y al método de administración (individual o colectiva). • Descripción del test: • 2.1. Introducción donde se explique el objetivo del test; su relación con otros tests; y sus antecedentes y desarrollo del test. • 2.2. Explicitar el campo de aplicación (aspectos psicológicos que se pretende estudiar). • 2.3. Instrucciones de aplicación, y tiempos de los que se dispone para la ejecución de cada una de las partes. • 2.4. Indicar la forma de puntuar, para ello, se incluirán plantillas con soluciones; así como ejemplos comentados. • 3. La justificación: incluye los datos cuantitativos que justifican el uso del test (fiabilidad, validez y tipificación del test). • 4. Referencias bibliográficas. Ejemplo: manual BFQ-N

  47. ASSESSMENT OF THE CLASSICAL THEORY OF TEST Citations procedures have been developed under the TCT , which has been and remains one of the most influential in the field of measurement in psychology psychometric models . - Advantages: Simplicity, clarity and flexibility of its concepts From minimal assumptions , it provides a solution to a wide range of measurement problems. - Limitations: The assumptions can not be verified empirically The assumption of constant measuring error for different skill level is implausible The properties of the test and the scores of the subjects are not invariant . Undifferentiated conception of measurement error

  48. BIBLIOGRAPHY 1.Barbero, I., García, E. Vila, E., y Holgado, F.P. (2010). Psicometría: Problemas resueltos. Madrid: Sanz y Torres. Se trata de un libro de ejercicios y problemas en el que se incluye el desarrollo de la solución. El alumno podrá completar desde un punto de vista aplicado los conceptos y contenidos vistos en la parte teórica; así como adquirir las destrezas necesarias para la resolución de problemas. 2. Barbero, I. (Coord.) , Vila, E. y Holgado, F.P. (2010). Psicometría. Madrid: Sanz y Torres. El capítulo 9 puede servir para preparar los contenidos relacionados con la asignación y transformación de puntuaciones. 3.Martínez Arias, R. (1995).Psicometría: Teoría de los Tests Psicológicos y Educativos. Madrid: Síntesis. En el Cap. 20 se tratan los principales aspectos relacionados la asignación de puntuaciones, sus transformaciones; y presenta los principales procedimientos de equiparación. Todo ello está ejemplificado abundantemente.

More Related