1 / 40

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Unit 2: Test Worthiness and Making Meaning out of Raw Scores. and Common Assessment Instruments for Today’s World. Test Worthiness: What Does it Take. Four requirements of test worthiness: Validity : measures what it is supposed to Reliability : Score is an accurate measure of

indira-cote
Download Presentation

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unit 2: Test Worthiness and Making Meaning out of Raw Scores and Common Assessment Instruments for Today’s World

  2. Test Worthiness: What Does it Take Four requirements of test worthiness: • Validity: measures what it is supposed to • Reliability: Score is an accurate measure of his/her true score • Cross-Cultural Fairness: Test is true reflection of the individual & not a function of cultural bias inherent in test • Practicality: Test is appropriate for situation

  3. Correlation Coefficient • Correlation Coefficient: Relationship between two sets of test scores. Range from -1.0 to +1.0 • Positive Correlation: Tendency for scores to be related in the same direction • Negative Correlation: Tendency for scores to trend toward opposite direction (inverse)

  4. Strong Correlation (Relationship) • Indication of Strong Relationship: -1.0 and +1.0 indicates strong relationship • Weak or No Relationship: 0 • Scatterplot: Graph showing two or more sets of test scores • Positive correlation: Diagonal line rises from left to right • Negative correlation: Diagonal line rises from right to left

  5. Scatterplot: Positive Correlation Positive Correlation Negative Correlation

  6. Scatterplot: Weak or No Correlation

  7. Coefficient of Determination:Shared Variance • Coefficient of Determination: Common factors that account for a relationship. • Correlation Coefficient² • Example: On tests of depression & anxiety, a .85 correlation was found in these two tests. Square .85: .85 x .85 = .7225 .7225 x 100 = 72.25 or 72% • This shows that anxiety & depression share a large number of factors - but not all factors.

  8. Test Worthiness: Validity • Validity: The degree to which a test measures what it’s supposed to measure • Forms of Validity: • Content Validity • Criterion-related Validity • Concurrent Validity • Predictive Validity • Construct Validity • Experimental Design Validity • Convergent Validity • Discriminant Validity

  9. Validity: Content Validity • Content Validity: The content of the test is appropriate for what the test intends to measure • Face Validity: The superficial appearance of the test. A valid test may or may not have face validity. *Face validity is not a true measure of validity

  10. Validity: Criterion-related Validity • Criterion-related Validity: Relationship between test scores and another standard • Concurrent Validity: Relationship between test scores & another currently obtainable benchmark • Predictive Validity: Relationship between test scores & a future standard • Standard Error of Estimate: Range where a predicted score might lie False Positive: A test incorrectly predicts a test- taker will have an attribute or be successful False Negative: A test incorrectly predicts a test- taker will not have an attribute or be successful

  11. Validity: Construct Validity • Construct Validity: Evidence that an idea or concept is actually being measured by the test (Is the test for intelligence truly measuring intelligence?) • Evidence used to measure construct validity: • a) Experimental design: Using experimentation to show that a test measures a concept • b) Factor analysis: Statistically examining relationship between subscales and larger construct (between individual subject areas and the test as a whole)

  12. Validity: Construct Validity • Convergent Validity: Relationship between a test and other similar tests (highly correlated - say .75 range) • Discriminant Validity: Showing a lack of relationship between a test and tests of unrelated concepts (test between depression and anxiety)

  13. Reliability • Reliability: The degree to which test scores are free from errors of measurement “Perfect world” scenario: Test is well-made, the environment is optimal, & the test taker is at his/her best • Reliability Coefficient: Are test scores consistent and dependable?

  14. Reliability: Measuring Reliability • Test-retest Reliability: Relationship between test scores from one test given at two different administrations to the same people • The closer the two sets of scores, the more reliable the test • Test-retest reliability is more effective in areas that are less likely to change over time

  15. Reliability: Measuring Reliability • Alternate Forms Reliability: Relationship between scores from two similar versions of the same test • Examiner designs alternate, parallel, or equivalent forms of the original test and administers this alternate form as the second test • One of the problems is to insure that both tests are truly equal

  16. Reliability: Internal Consistency • Internal Consistency: Reliability measured statistically by going “within” the test (how scores on individual items relate to each other or the test as a whole) • Types of Internal Consistency: 1) Split-half (odd-even) 2) Cronbach’s Coefficient Alpha 3) Kuder-Richardson

  17. Reliability: Internal Consistency • Split-half Reliability: Correlating one half of a test against the other half • Advantages of Split-half: 1) Having to give only one test 2) Not having to create a separate alternate form • Disadvantages of Split-half: 1) False reliability if two halves are not parallel or equivalent 2) Make test half as long (shortening test may decrease reliability

  18. Reliability: Internal Consistency • Spearman-Brown Equation: Mathematical compensation for shortening the number of correlations by using the split-half reliability test • Spearman-Brown Equation: Spearman = Brown reliability = 2ʳʰʰ 1 + ʳʰʰ - Where ʳʰʰ is the split-half reliability estimate *If a test manual states that split-half was used, check to see if the Spearman-Brown formula was used. If not, the test may be more reliable than is noted.

  19. Reliability: Internal Consistency • Cronbach’s Coefficient Alpha and Kuder-Richardson: • Methods that attempt to estimate the reliability of all the possible split-half combinations by correlating the scores for each item on the test with the total score on the test and finding the average correlation for all of the items Kuder-Richardson can only be used with tests that have right and wrong answers (achievement) Coefficient Alpha can be used with tests with various types of responses (rating scales)

  20. Reliability: Item Response Theory • Item Response Theory: Examines each item individually for its ability to measure the trait being examined • Item Characteristic Curve: Assumes that as people’s abilities increase, their probability of answering an item correctly increases

  21. Reliability: Item Characteristic Curve • If “S” flattens out: Less ability to discriminate or provide a range of probabilities of getting a correct or incorrect response • If “S” is tall: Item is creating strong differentiation across ability 1.0 .75 Probability of Correct Answer .50 .25 0.0 55 70 85 100 115 130 145 IQ Ability

  22. Cross-cultural Fairness • Cross-cultural Fairness: Degree to which cultural background, class, disability, and gender do not affect test results • Tests must be carefully selected to prevent bias • Test scores must be interpreted in light of the cultural, ethnic, disability, or linguistic factors that may impact scores

  23. Practicality • Practicality: Feasibility considerations in test selection and administration • Major Practical Concerns: 1) Time: Amount of time to administer 2) Cost: Budgeting issues 3) Format: Print, type of questions 4) Readability: Understandability 5) Ease of Administration, Scoring, & Interpretation

  24. Selecting& Administering a Good Test 1) Determine goals of your client 2) Choose instrument to reach client goals 3) Access information about possible instruments a) Source books on testing 1) Buros Mental Measurements Yearbook 2) Tests in Print 4) Examine Validity, Reliability, Cross-cultural Fairness, & Practicality of the Possible Instruments 5) Choose an Instrument Wisely

  25. Unit 2: Statistical Concepts Making Meaning Out of Raw Scores

  26. Raw Scores are Meaningless • Raw Scores: Untreated score before manipulation or processing • Norm Group Comparisons Are Helpful: 1) Tells us relative position within the norm group 2) Allows us to compare the results among test- takers 3) Allows us to compare test results on two or more different tests taken by same person

  27. Procedures for Normative Comparisons • Frequency Distribution: List of scores & number of times a score occurred • Orders a set of scores from highest to lowest & lists corresponding frequency of each score • Allows identification of most frequent scores and helps identify where an individual’s score falls relative to the rest of the group

  28. Histograms & Frequency Polygons • Histogram: Bar graph of class intervals & frequency of a set of scores Class Intervals: Grouping scores by a pre-determined range • Frequency Polygon: Line graph of class intervals & frequency of a set of scores

  29. Cumulative Distributions(Ogive Curve) • Cumulative Distribution: Line graph to examine percentile rank of a set of scores • Applications: Good for conveying information about percentile rank

  30. Normal Curves & Skewed Curves • Normal Curve: Bell-shaped curve that human traits tend to fall along • Predictable pattern that occurs whenever we measure human traits and abilities • Skewed Curves: Test scores that do not fall along a normal curve • Negatively Skewed Curve: Majority of scores at the upper end • Positively Skewed Curve: Majority of scores at the lower end

  31. Measures of Central Tendency • Central Tendency: Give you a sense of how close a score is to the middle of the distribution • Three Measures of Central Tendency: 1) Mean: Arithmetic average of all scores: add all scores and divide by # of scores 2) Median: Middle score: 50% fall above; 50% fall below 3) Mode: Most frequently occurring score *In a skewed distribution, median is a better measure of central tendency.

  32. Measures of Variability • Measures of Variability: How much scores vary in a distribution • Three Measures of Variability: 1) Range: Difference between highest & lowest score plus 1 2) Interquartile Range: Middle 50% of scores around the median 3) Standard Deviation: How scores vary from the mean

  33. Measures of Variability: Range • Range: Tells you the distance from the highest to lowest score • Calculated by subtracting the lowest score from the highest score and adding 1

  34. Measures of Variability: Interquartile Range • Interquartile Range: Provides the range of the middle 50% of scores around the median • Useful with skewed curves because it offers a more representative picture of where a large percentage of scores fall • Calculate: Subtract the score that is 1/4 of the way from the bottom from the score that is 3/4 of the way from the bottom & divide by 2. Next, add & subtract this number to the median

  35. Measures of Variability: Standard Deviation • Standard Deviation: Describes how scores vary from the mean • In all normal curves, the percentage of scores between standard deviation units is the same • 99.5% of people fall within the first three standard deviations *Adequate scores are in the “eye of the beholder”

  36. Common Assessments:Situation Specific • Developmental Disabilities: Impairment in Cognitive, Communication, Social/Emotional, & Adaptive (daily living skills) Functioning • Assessments Used: 1) Bayley Scales of Infant Development 2) Wechsler Preschool & Primary Scales of Intelligence, 3rd Edition 3) Wechsler Intelligence Scale for Children, 4th Ed. 4) Autism Diagnostic Observation Scale 5) Vineland Adaptive Behavior Scale, 2nd Ed.

  37. Common Assessments: Situation Specific • Learning Disabilities: Disorders that affect a broad range of academic & functional skills, i.e., speaking, listening, reading, writing, spelling, & completing math calculations. Deficit in one or more ways the brain processes information • Assessments 1) Wechsler Preschool & Primary Scale of Intelligence 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Wechsler Individual Achievement Test, 2nd Ed.

  38. Learning Disabilities Assessments,Continued 5) Wechsler Memory Scale, 3rd Ed. 6) Woodcock-Johnson Test of Achievement, 3rd Ed. 7) Comprehensive Test of Phonological Processing 8) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 9) Beck Depression Inventory, 2nd Ed. 10) Beck Anxiety Inventory

  39. Common Assessments: Situation Specific • Attention Deficit/Hyperactivity Disorder 1) Wechsler Intelligence Scale for Children, 4th Ed. 2) Processing Speed Index 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Woodcock-Jackson Test of Achievement, 3rd Ed. 5) Understanding Directions Subset 6) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 7) Behavior Assessment System for Children, 2nd Ed. (Parent report, Teacher report, Self-report)

  40. Common Assessments: Situation Specific • Gifted and Talented Evaluation: Individuals who are so gifted or advanced, they need special provisions to meet their educational needs • Assessments 1) Wechsler Preschool & Primary Scale of Intelligence (3rd Ed.) 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed.

More Related