1 / 24

Automatic Assessment of Spoken Modern Standard Arabic

Automatic Assessment of Spoken Modern Standard Arabic. NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki. Outline. Pearson Knowledge Technologies How Versant tests operate

calder
Download Presentation

Automatic Assessment of Spoken Modern Standard Arabic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Assessment ofSpoken Modern Standard Arabic NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki

  2. Outline • Pearson Knowledge Technologies • How Versant tests operate 2. Versant Arabic Test (development) 3. Validation evidence 4. Predictive accuracy

  3. Pearson Knowledge Tech. (PKT) (KAT + Ordinate) are now PKT KAT ≈ {LSA, Essay Scoring, Write-to-Learn, PTE, etc.} Ordinate ≈ {Versant, ORF for NCES, VersaReader, PTE, etc.) PKT is part of Pearson Pearson ≈ { FT, Economist, Penguin, Longman, PsychCorp, … etc} PearsonKT is in Boulder, Colorado and Palo Alto, California.

  4. Test delivery Scoring system ENGLISH speech Database tests, prompts, responses ARABIC Delivery Interface Communication Network DUTCH report SPANISH California Anywhere

  5. How Versant tests operate “The train’s been delayed by one hour ” Test Delivery Server Versant Database Scoring

  6. Versant Arabic Test • DLI purpose • ~1000 students at DLI need predictive speaking tests • Requirements • Accurate test of Arabic listening & speaking • Convenient to use at DLI and worldwide (ILR is costly) • Suitable for repeated formative testing • High peak capacity for mass screening

  7. Construct Comparison OPI Construct:Oral Proficiency as manifest in an Oral Proficiency Interview, is compatible with communicative competence as reflected in the functional level and/or complexity of content accurately produced. VersantConstruct: facility in spoken language–the ability to understand spoken language and speak appropriately in response at a conversational pace on everyday topics.

  8. Versant Arabic Test Test Structure Part A: Reading Part B: Repeat -1 Part C: Short Answers Part D: Sentence Builds Part E: Repeat -2 Part F: Passage Retelling

  9. 20% 30% 30% 20% Fluency Sentence Mastery Vocabulary Pronunciation HumanScoring Read Repeat Sentence 1 SAQ Sent Build Repeat Sentence 2 Passage Versant Scoring

  10. How Versants are developed (1) ScaleEstimates NativeJudges scale scores Criteria Internal Ordinate System Versant Scores NativeScribes transcripts Validation (Versant Arabic Test) External Recorded Items Item Text Arabic Natives ILR Scores Concurrent ILR Interviews Arabic Learners Native TestDevelopers Test Spec

  11. kutubu al-waladi– the books of the boy kataba al-waladu – wrote the boysubj No disambiguating short vowels written Vowels carry phonetic information Vowels carry grammar information Arabic Challenges: Voweling

  12. forvisitof us – for our visit Complicates lexicon lookup, frequency estimates… “Short” Arabic items are harder than English items with the same number of words Complex Morphology naa ziyaarat li

  13. Development & Run-time Processes Compilation of expectation and runtime flow

  14. Training data sources Prompt Voices and Training Samples

  15. Reliability: Scores are consistent Validity: Native and non-native speakers should be clearly distinct MSA and dialect speakers should be distinct(since we’re testing MSA) Machine scores should predict human scores Validation Criteria

  16. Reliability

  17. Native ~ Non-Native Scores

  18. Natives by Countries

  19. Educated ~ Uneducated Speakers CumulativeDensity Arabic Overall Score

  20. Machine – Human Comparison

  21. How Versants Compare to OPIs ILR OPI Score (logits) N = 118 r = 0.87 Versant Arabic Overall Score

  22. ILR OPI Score (logits) N = 37 r = 0.92 Versant Spanish Score Spanish & English: Versant ~ Human Spanish English N = 37r = 0.92 N = 151r = 0.86

  23. Summary • Versant Arabic Test (VAT) is in operation • Based on a large and wide body of transcribed spoken material • VAT is available on demand • Returns consistent, accurate scores that reflect real-time skills with MSA • VAT can triage or screen for OPI tests

  24. النهاية Thanks to Waheed Samy, Naima Bousofara Omar, Eli Andrews,Mohamed Al-Saffar, Nazir Kikhia, Rula Kikhia,and Linda Istanbullifor item development and data collection/transcription in Arabic,and to Andy Freeman for providing diacritic markings.

More Related