420 likes | 586 Views
Automated Scoring for Next Generation Assessments. Karen Lochbaum November 17, 2011. Next Generation Assessments: Desired Features. Align to Common Core State Standards Capture student performance and application of higher order skills Track growth toward college readiness
E N D
Automated Scoring for Next Generation Assessments Karen Lochbaum November 17, 2011
Next Generation Assessments: Desired Features • Align to Common Core State Standards • Capture student performance and application of higher order skills • Track growth toward college readiness • Shift to digital technologies • Immediate feedback to inform instruction Automated scoring is a key lever for implementing performance assessments reliably and affordably at scale.
Benefits of Automated Scoring • Immediacy & Efficiency • Evaluate responses in seconds • Reduce score turnaround time • Give students and teachers instant feedback • Reduce costs • Accuracy • Trained based on the collective wisdom of many skilled human scorers • Consistency, Objectivity • Can detect off-topic, inappropriate and “odd” responses
Common Core State StandardsKey Features • Reading: Text complexity and the growth of comprehension • Language: Conventions, effective use, and vocabulary • Writing: Diverse text types, responding to readings and research • Speaking and Listening: Flexible communication and collaboration in a variety of settings • Mathematics:Conceptual understanding and authentic problem solving
Text Complexity • ACT’s Reading Between the Lines Report • Too few students are prepared for post high school readings • Proficiency in understanding complex texts is key • How can we best measure the complexity of texts? • How can we track student progress?
Pearson’s New Reading Maturity Metric • Traditional approaches: surface level features • sentence length, word length, word lists • New approach: Deep analysis of meaning and substance • Simulateshow individual words gradually develop their unique meanings • About 30% more accurate than traditional readability formulas • Identifies most difficult and most important words
Demo Common Core Texts Appendix B Grade 9-10 Informational Texts Language Arts Wiesel, Elie. “Hope, Despair and Memory.” Nobel Lectures in Peace 1981–1990. Singapore: World Scientific, 1997. (1986)
Writing Performance Tasks You advise Pat Williams, the president of DynaTech, a company that makes precision electronic instruments and navigation equipment. Sally Evans, a member of DynaTech’s sale force, recommends that DynaTech buy a small private plane (a SwiftAir 235) that she and other member of the sales force could use to visit customers. Pat was about to approve the purchase when there was an accident involving a SwiftAir 235. Document Library • Newspaper article about the accident • Federal Accident Report on in-flight breakups in single engine planes • Internal Correspondence (Pat’s email to you and Sally’s e-mail to Pat) • Charts relating to SwiftAir’s performance characteristics • Excerpt from magazine article comparing SwiftAir 235 to similar planes • Pictures and descriptions of Swiftair Models 180 and 235 Example from the Collegiate Learning Assessment
Writing Tacit Leadership Knowledge Scenarios You are a new platoon leader who takes charge of your platoon when it returns from a lengthy combat deployment. All members of the platoon are war veterans, but you did not serve in the conflict. In addition, you failed to graduate from Ranger School. You are concerned about building credibility with your soldiers. What should you do?
Writing Automated assessment of diagnostic skills National Board of Medical Examiners
Science • Use the technical passage 'Green Ocean Machine' to answer the following. • The passage states that “the new green partner [alga] seems to provide Hatena with most of its energy needs.” • Describe the process that enables organisms to use energy from light to make food. In your description, be sure to include: • the specialized features needed to produce food • the substances needed to produce food • the substances produced during this process Example from the Maryland School Assessment
Speaking & Listening waveform spectrum words segmentation
Spoken Item TypesOral Reading Fluency Demonstration • Assessment • Oral reading rate • Accuracy • Expressiveness
Passage 1 A boy named Tom was at the bus stop. He was waiting for the school bus. There was no one there, but him. The bus was late. Tom began to talk to himself. “Maybe the bus forgot me,” he said. Then, Tom heard a dog barking. He looked up and saw his dog Spot running down the road. Spot ran to Tom. He was so happy to see Tom that he jumped into Tom’s arms. Just then, Tom heard the bus coming. He didn’t have time to take Spot home. There was no time to think. Tom grabbed Spot and hid him under his coat. The bus pulled up to Tom’s bus stop. Tom got on the bus and went to the back. His friend, Jack, had saved a seat for him. Just as Tom sat down a little yelp came from under his coat. “What do you have under there, Tom?” Jack asked. “If I tell you, do you promise not to tell?” replied Tom. “You bet! I’m your best friend, aren’t I?” asked Jack. Tom told Jack what had happened. He asked his friend what he should do. Jack had an idea. “You can tell the teacher you have something very cool for show and tell. Then, you could call your mom and have her come and pick up Spot.” Tom decided that’s what he would do. His teacher was surprised. His mom was mad, but Spot was very happy.
Describe a picture or graph • Assessment • Vocabulary • Language Use • Pronunciation • Fluency
Hear sentences and repeat them “What are you going to do this weekend?” “It wasn’t bad, but it wasn’t good either.” “The dog was barking all night long.” • Assessment • Sentence mastery • Pronunciation • Fluency
Sentence Builds after the movie ended… went home… we all… • Assessment • Sentence mastery • Pronunciation • Fluency
Story Retelling Overall RETELL Comprehension Sentence Mastery Fluency Pronunciation
Progressive rubrics check for both conceptual understanding and ability to execute.
Here, the student is asked to find an expression for the area in the figure.
Highlighted feedback shows how MathQuery correlates the student’s response with elements of the problem.
Automated Scoring Approach • Learn to score based on several hundred human scored responses • Trained on their collective wisdom • Measure the content and quality of responses by determining • The features that human scorers evaluate when scoring a response • How those features are weighed and combined to produce scores
Other Features of IEA • Uses non coachable measures • No counts of total words, syllables, characters, etc. • No trigger surface features: “thus”, “therefore” • Detects larding of big words • Knows when it doesn’t know • Detects off-topic or highly unusual essays, non-standard language constructions, too long, too short …
Content Based Scoring • Use Latent Semantic Analysis (LSA) to capture the “meaning” of language • LSA knows that • Surgery is often performed by a team of doctors. • On many occasions, several physicians are involved in an operation. mean about the same thing even though they share no words. • Enables evaluating the content of what is written rather than just matching keywords
Spoken Assessments waveform spectrum words segmentation
Example: Native Speaker REPEAT: New York City is famous for its ethnic diversity. Pronunciation: 8.7 Fluency: 8.1 Accuracy: 0 word error
Example: Learner REPEAT: New York City is famous for its ethnic diversity. Pronunciation: 5.9 Fluency: 3.3 Accuracy: 1 word error (insertion)
Pronunciation Fluency Accuracy Performance Comparison 3.026 seconds Native speaker 5.502 seconds Learner
Keys to Success Design forautomated scoringfrom the START!
Keys to Success • Item Development • Clear specification of performance, skills, and assessment • Optimize for scoring effectiveness • Item Delivery • Input and capture of student response • Field Test and Human Scoring • Representative samples • Double scoring with resolution
Keys to Success • Psychometrics • Automated scoring performance as part of field test item evaluation • Operational Scoring & Monitoring • Requirements vary with nature of assessment and acceptable performance criteria • Automated scoring in combination with human scoring