1 / 18

General look at testing

General look at testing. Taking a step backwards. Issues of importance within testing. Categories of Tests - different classification methods Content vs non-content Uses and Users of Tests Assumptions and Questions behind the use of tests Creating a test from scratch. Ind administered.

said
Download Presentation

General look at testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. General look at testing Taking a step backwards

  2. Issues of importance within testing • Categories of Tests - different classification methods • Content vs non-content • Uses and Users of Tests • Assumptions and Questions behind the use of tests • Creating a test from scratch

  3. Ind administered Intelligence Gp administered Series of tests e.g., reading & mathematics Batteries Single Subject Certification Vocational Standardisation Government Diagnostic Individual Categories of Tests / content Wechsler Adult Intelligence Scale - WAIS Mental Abilities Memory, spatial, creativity Scholastic Assessment Test - SAT Achievement Tests

  4. Objective Projective Vocational Attitude Values Categories of Tests / content Minnesota Multiphasic Personality Test - MMPI Personality Rorschach Inkblot Test Interests & Attitudes Neuropsychological Luria-Nebraska Neuropsychological battery - LNNB

  5. Categories of Tests / non-content • Paper and pencil vs. performance • Respondent selects between predefined answers • Examinee performs some action and is judged on it • Speed vs. power • Former purely interested in speed • Latter tests limits of knowledge or ability - no time limit imposed • Usually both are tested at the same time

  6. Categories of Tests / non-content • Individual vs. group testing • Maximum vs. typical performance • Ability tests usually want to know about best performance • On personality test - how typically extroverted are you • Norm-referenced vs. criterion referenced performance • Only relative performance considered • How well did you do relative to predefined criteria

  7. Users of tests • Professional psychologists - time spent in assessment • Psychologists working in a mental health setting spend - 15-18% (Corrigan et al ‘98) • Over 80% of neuropsychologists - 5 or more hours /wk (Camara et al, ‘00) • Educational psychologists - 1/2 of working wk (Hutton et al, ‘00) • 2/3 of counseling psychologists use objective measures regularly (Watkins et al, ‘98)

  8. Other uses of tests • Within education • Measure performance or predict future success • Personnel • Select appropriate person or to select the task to which the person is most suited • Research • Test often serves as the operational definition of the DV

  9. Basic assumptions • Humans must have recognized traits which we consider to be important • Inds must potentially differ on these traits • These traits must be quantifiable • Traits must be stable across time • Traits must have relationship with actual behaviour

  10. Issues to be concerned about • How the test was developed • Reliability • Validity

  11. Constructing a reliable test • Is a much more extensive a process than average user realises • Most personality constructs have been established – tests readily available to measure them – proliferation of tests would therefore seem pointless from a theoretical point of view

  12. Writing test items • Covered question format before – in addition • Need to ensure that all aspects of the construct should be dealt with –anxiety- all the different aspects of construct should be considered • Need to be long enough to be reliable - start with around 30 and reduce to 20 • Should only assess one trait • Should be culturally neutral • Should not be the same item rephrased (mentioned during FA)

  13. Establishing item suitability • Should not be too many items which are either very easy or very hard • >10% of items with scores < .2 or >.8 is questionable • Items should have an acceptable standard deviation. If it is too low then it is not tapping into individual differences • If there are different constructs then it is important that an equal number of items refers to each construct.

  14. Establishing item suitability • Criterion keying – choosing items based on their ability to differentiate groups • Atheoretical • Groups must be well defined • Interpret liberally since there will be overlap in response distributions • By FA – items that have a low loading (<.3) would be removed

  15. Establishing item suitability • Classical item analysis • Correlation of item score with score on the whole test (excluding that item) calculated • Removing item with low such correlation improves reliability • But since reliability is also a product of number of items there is a balance • Point comes where removing a poor item decreases reliability since it depends on the average correlation and the number of items in the test • Each time an item is removed the correlation of each item to the main score must be recalculated since this will change as items are removed

  16. Revisiting reliability and validity • Each scale should assess one psychological construct • Measurement error means that for any one item the psychological construct only accounts for low % of the respondent’s variation • Other factors cause most of variation - age, religious beliefs, sociability, peer-group pressure • Use several items and this random variation should cancel each other out such that measured variance is due to underlying construct

  17. Reliability of test • Does not mean temporal stability (test-retest reliability measured through parallel forms) • Is a measure of the extent to which a scale measures one construct only • Split-half reliability • Cronbach’s Alpha • Influenced by the average correlation between the items and the number of items in the test • Boosted by asking the ‘same’ question twice • Test should not be used if alpha is below .7

  18. Test Validity • Face Validity • Content Validity • Construct Validity • Predictive Validity

More Related