Using the IRT and Many-Facet Rasch Analysis for Test Improvement

Using the IRT and Many-Facet Rasch Analysis for Test Improvement Desislava Dimitrova, Dimitar Atanasov New Bulgarian University “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” BILC Seminar, 10-15 October 2010-Varna

Outline • Examination procedure • Main concepts and observations • Socio-cognitive test validation framework, Cyril Weir (2005) and criteria • Scoring validity for listening and reading parts of the test • Scoring validity for essay

Test structure 1. Listening paper: two tasks • 15 MCQ 2. Reading paper: five tasks • 6 items matching response format • 10 items bank-cloze response format • 10 items open-cloze response format • 16 items short-answer response format • 2 open-ended questions • 5 MCQ 3. Essay: 180-220 words

Too much? • The concept of communicative language ability (CEFR) • The concept of test usefulness (Bachman) • The concept of justifing the use of language assessment in real world (Bachman) • The concept of validity • The Code of practice (ALTE*, for example) *Association of Language Testers in Europe

Statements NBU exam is high-stake. NBU exam is criterion-oriented. NBU exam is ‘independent’. Evidences for test validation were not established, BUT there was a routine practice for test development process and test administration.

The Socio-cognitive Framework for test validation, Cyril Weir (2005) Test takers characteristics and: Context validity Theory-based validity Scoring validity Consequential validity Criterion-related validity

“Before-the –test- event” Context validity Theory-based validity “After- the- test –event” Scoring validity Consequential validity Criterion-related validity

Scoring validity for listening and reading parts of the test are established by: • Item analysis • Internal consistency • Error of measurement • Marker reliability Not just looking at them! Investigate, discuss, learn and take decisions!

Analisis 3-parameter IRT model Advantages • Item parameter estimates are independent of the group of examinees used • Test taker ability estimates are independent of the particular set of items used Degree of Difficulty to specify the discrimination to specify the content

Summer session, 2010

Possible decisions • Remedial procedures • Classroom assessment • Only certification decision

Scoring validity for writing is established by: • Criteria/rating scale • Rating procedures: Rater training Standardization Rating conditions Rating Moderation Statistical analysis Raters • Grading

Good Two raters Analytic writing scale Rubrics and input Negative The score depends on the raters No task specific scale No standardization Conclusion for the essay:

Now is fact that: We will continue our work for • item writer’s training • content and statistical specification of the items • test review and test revision

Shearing: Investigation (small steps to “strong” validity). Comparison (language ability of the same population at the same level) Cooperation (in research project)

Thank you New Bulgarian University www.nbu.bg

Using the IRT and Many-Facet Rasch Analysis for Test Improvement

Using the IRT and Many-Facet Rasch Analysis for Test Improvement

Presentation Transcript

IRT

Using Process Improvement and Knowledge Management for Better Predictive Analysis Capability

USING TECHNOLOGY FOR IMPROVEMENT

Reviewing and Using Data for School Improvement

Item Analysis Using The Rasch Model

Test Improvement

Test Process Improvement

Using Data for Program Improvement

Quality Improvement and the Model for Improvement

The Many Ways of Using Tests for Educational Improvement

FIT ANALYSIS IN RASCH MODEL

USING RTI FOR DISTRICT IMPROVEMENT

Rasch analysis of the Roland-Morris Disability Questionnaire

Quality Improvement and the Model for Improvement

“Applying the Facet Theory and Statistical Analysis

Detection of Differential Item/Test Functioning (DIF/DTF) Using IRT

Using data for program improvement

Rasch vs. IRT

Scenario-based scales: Integrating Guttman facet theory and Rasch measurement

Test Process Improvement

Validate a verbal aptitude test for adolescents: using Item Response Theory (IRT) RDC Niroshinie

Using Simulation to evaluate Rasch Models