780 likes | 2.08k Views
ITEM ANALYSIS. CONSTRUCTION of test & analysis. Characteristics of Good Test. Validity: It refers to the appropriateness or truthfulness of a tool. A tool is valid if it measures what it is supposed to measure. Reliability:
E N D
CONSTRUCTION of test & analysis
Characteristics of Good Test • Validity: • It refers to the appropriateness or truthfulness of a tool. A tool is valid if it measures what it is supposed to measure. • Reliability: • It refers to the trustworthiness or consistency of measurement of a tool , whatever it measures.
Objectivity: • Refers to the absence of subjective bias in the interpretation of responses obtained by a tool. • Economy: • The test should be simple and administered in a short time , saving money and time.
Practicability or Feasibility: • The test should not require special infra- structure like dark room, one way see-through room etc.
Trail Test • It involves time and resources • Prepare content analysis and blue print • Review each item before trail testing
Content Analysis • What is the area of curriculum is selected? • Are there significant sections in the content? • Are there significant subdivisions in the content? • Which of the representative areas should include ?
Blue Print • Title • Fundamental purpose • The aspects of curriculum covered • For whom the test is constructed • Time ,date, who will administer and who will score • Weightage for recall , comprehensive and reflective thinking
Item Revision-1 • The dependable inferences can be made about the choice of the content • All important parts of curriculum is addressed • Achievement over the range is assessed
How to review? • Is the item is clear in expression ? • Are the items expressed in a simplest possible language ? • Are there unintended clues to correct answer? • Is the format reasonably consistent? • Is there a single, clearly correct answer for each item ? • Is the type of item appropriate to the information required ? • Are there enough items to provide adequate coverage to behaviour to be assessed ?
Purpose of Trail Test • Establishes the difficulty of each item • Identify the distracters which do not appear plausible. • Suggest number of items to be included in the final test • Establishing the contribution of each item to the discrimination between candidates who achieve low and high. • Check the adequacy of the administration instructions to identify misconceptions held by the students through analysis of their responses.
Choosing a Sample • Sample of 100 to 150 students of varied abilities may be selected • Approximately male and female students are equal • Judgment Sampling technique- Target group
Try out of the Test • The test to be administered on a representative sample , chosen from the target population for whom the test is intended , and scored . This pilot study will be useful for the following : • To identify the weak or defective item and to reveal needed improvements. • To determine the difficulty level and discriminating power of each individual item in order that a selection of item may be made.
To provide data needed to determine appropriate time limit for the final test. • To standardize the instruction and procedures. • To know how to organize the items. • To decide the proper format.
Scoring of Trail Test • Needs training • Not according to the scorers' judgment • Refer to scoring key • Mechanical scoring is recommended to maintain accuracy
Arranging Pupil • After scoring the test in the trial test , according to the total score value , individuals are placed in order from high to low .
Indices of difficulty and discriminating power of items • Top 27% constitutes the high achievers and the bottom 27% constitutes the low achieving group. • The indices of discriminating power and difficulty level are computed for each item of the test using the following formulae.
Discriminating power =Ph-Pl U • Difficulty level = (Ph + Pl ) U • Ph= the proportion of pupils in the high achieving group who answered the items correctly. • Pl =the proportion of pupils in the low achieving group who answered the items correctly. • U=Total number of pupils in both groups
Types of Discriminators • Positive Discriminator • Negative Discriminator • Non Discriminator
Graphical Analysis of Scores • Acceptable & may be acceptable correct answer response pattern. • Non acceptable correct answer response pattern.
Is this a good item ? • Compute the difficulty and discrimination indices for an item administered to 263 pupils where 74 pupils answered the item correctly, 32 pupils in upper group and 23 pupil in the lower group passed the item. • Is this a good item ?
Is this a good item ? • Compute the difficulty and discrimination indices of a test item administered to 84 pupils if 52 test takers answered the item correctly, 20 in the upper group and 12 in the lower group. • Is this a good item ?
Selection of Items • Based on the calculated values of item discrepancy and difficulty , appropriate items are chosen for the final form of the standardized test. • Arranged the items in the increasing order of difficulty.
Assembly of the test in the final form • Based upon discriminating power items are first chosen and among the so chosen items, items with proper difficulty level are finally selected for the final form. • Care should be taken to see that at least 50% of the items are of average difficulty, 25% are easy , 20% difficult and 5% are very difficult.
A detailed scoring scheme is also to be prepared • so as to ensure objective evaluation of pupil’ responses. • Appropriate instruction/procedure for administering the test has also to be developed and incorporated suitably in the test.
Advantages of Item Analysis • Powerful technique to improve instruction. • Helpful for guidance. • Valid measures of instructional objectives. • Gives clue to the nature of the misunderstanding and suggests remediation.
Reliability • Stability and trustworthiness is called reliability. • It should be free from error. • (E.G.) Standford Binet’s I.Q. • The score is a good estimate of the child’s mental ability.
Methods of determining Reliability • Four procedure for computing reliability coefficient. • Test Retest method • Alternative or Parallel form • Split half technique • Rational Equivalence
Test Retest Method • Repetition of the test is the most simplest method of determining agreement between two sets of scores. • The test is given and repeated on the same group and the correlation computed between the first and second set of scores.
Defects in Test Retest method • If the test is repeated immediately, many subjects will recall their first answer- tend to increase their scores. • Practice and confidence induced by familiarity also affect scores. • If the interval is longer ( six month) growth changes will effect the retest. • Because of these defects test retest is generally less useful than are the other methods.
Alternative or Parallel form method • When alternative or parallel forms of a test can be constructed , the correlation between form A and form B may be taken as a measure of the self correlation of the test. • The alternative form method is satisfactory when sufficient time has intervened between the administration of the two forms to weaken or eliminate memory and practice effects.
When form B of a test follows form A closely , scores on the second form of the test will often be increased because of familiarity. • If such increases are approximately constant (3 to 5 points) the reliability coefficient of the test will not be affected, since the paired A and B scores maintain the same relative positions in the two distributions.
In drawing up alternative test forms ,care must be exercised to match test materials for content, difficulty and form. • When alternative forms are virtually identical , reliability will be too high otherwise reliability will be too low. • An interval of at least two to four weeks should be allowed between administration of the test.
The split half method • In this method the test is first divided into two equivalent haves and the correlation found for these half – tests . • From the reliability of the half test the self correlation of the whole test is then estimated by the Spearman Brown Prophecy formula.
The split half method is regarded by many as the best of the methods for measuring test reliability.
Advantage: • Advantage is the fact that all data for computing reliability are obtained upon one occasion. So that variations brought about by difference between the two testing situations are eliminated.
How to divide ? • Alternative Statements • All the items are of equal difficulty
Method of Rational Equivalence • This method represents an attempt to get an estimate of the reliability of a test free from the objections raised against the methods outlined above. • Two forms of tests are equivalent when the items a A , b B ,c C etc are inter changeable and when the inter item correlations are the same for both forms.
Errors • Chance Error: • Many psychological factors affect the reliability coefficient of a test – fluctuations in interest and attention shifts in emotional attitude and differential effects of memory and practice. • The environmental factors such as distractions, noise , interruptions, scoring errors etc all these are called ‘chance error ’ or ‘error of measurement’ • The scores may go up or down from the true value.
Constant Errors: • Constant errors work in only one direction . Constant error raise or lower all of the scores on a test but doesn't affect the reliability coefficient. • Such errors are easily be avoided than are chance errors by subtracting two points from a retest score to allow for practice.
Validity • The validity of a test or of any measuring instrument , depends upon the fidelity with which it measures , what it purports to measure. • A test is valid when the performances which it measures correspond to the same performances as otherwise independently measured or objectively defined.
Difference between Reliability and Validity • Suppose that a clock is set forward 20 minutes , if the clock is a good time piece the time it tells will be reliable(consistent) but will not be valid as judged by ‘standard time’. • Validity is a relative term.