160 likes | 271 Views
Children’s Oral Reading Corpus (CHOREC) Description & Assessment of Annotator Agreement L. Cleuren, J. Duchateau, P. Ghesquière, H. Van hamme The SPACE project. Overview Presentation. The SPACE project Development of a reading tutor Development of CHOREC Annotation procedure
E N D
Children’s Oral Reading Corpus (CHOREC)Description & Assessment of Annotator AgreementL. Cleuren, J. Duchateau, P. Ghesquière, H. Van hammeThe SPACE project
Overview Presentation • The SPACE project • Development of a reading tutor • Development of CHOREC • Annotation procedure • Annotation agreement • Conclusions
1. The SPACE project • SPACE = SPeech Algorithms for Clinical & Educational applications • http://www.esat.kuleuven.be/psi/spraak/projects/SPACE • Main goals: • Demonstrate the benefits of speech technology based tools for: • An automated reading tutor • A pathological speech recognizer (e.g. dysarthria) • Improve automatic speech recognition and speech synthesis to use them in these tools
2. Development of a reading tutor • Main goals: • Computerized assessment of word decoding skills • Computerized training for slow and/or inaccurate readers • Accurate speech recognition needed to accurately detect reading errors
3. Development of CHOREC • To improve the recognizer’s reading error detection abilities: CHOREC is being developed = Children’s Oral Reading Corpus = Dutch database of recorded, transcribed, and annotated children’s oral readings • Participants: • 400 Dutch speaking children • 6-12 years old • without (n = 274, regular schools) or with (n = 126, special schools) reading difficulties
3. Development of CHOREC (b) • Reading material: • existing REAL WORDS • unexisting, well pronounceable words (i.e. PSEUDOWORDS) • STORIES • Recordings: • 22050 Hz, 2 microphones • 42 GB or 130 hours of speech
4. Annotation procedure • Segmentations, transcriptions and annotations by means of PRAAT (http://www.Praat.org)
4. Annotation procedure (b) 1. Pass 1 p-files • Orthographic transcriptions • Broad-phonetic transcription • Utterances made by the examiner • Background noise • Pass 2 f-files (only for those words that contain reading errors or hesitations) • Reading strategy labeling • Reading error labeling
4. Annotation procedure (c) Expected: Els zoekt haar schoen onder het bed. [Els looks for her shoe under the bed.] Observed: Als (says ‘something’) zoekt haar sch…schoen onder bed. [Als (says ‘something’) looks for het sh…shoe under bed.]
5. Annotation agreement • Quality of annotations relies heavily on various annotator characteristics (e.g. motivation) and external influences (e.g. time pressure). • Analysis of inter- and intra-annotator agreement to measure quality of annotations • INTER: triple p-annotations by 3 different annotators for 30% of the corpus (p01, p02, p03) • INTRA: double f-annotations by the same annotator for 10% of the corpus (f01, f01b, f02)
5. Annotation agreement (b) • Remark about the double f-annotations: • f01 = p01 + reading strategy & error tiers • f01b = f01 – reading strategy & error tiers + reading strategy & error tiers • f02 = p02 + reading strategy & error tiers • Agreement metrics • Percentage agreement + 95% CI • Kappa statistic + 95% CI
5. Annotation agreement (c) • overall high agreement! К : 0.717 0.966 % : 86.37 98.64 • INTER • К : PT > RED * • % : RED > OT > PT * • INTRA • К : RSL > REL * • (1) > (2) * • % : RSL > REL * for (1) • RSL < REL * for (2) • (1) > (2) * * p < .05
5. Annotation agreement (d) • overall high agreement! К : 0.706 0.971 % : 82.18 98.72 • When looking at % agreement scores: • regular > special * (except for f01-f01b comparison) • However, when looking at kappa values: • No systematic or sign. differences: RED: regular < special * PT: regular > special * RSL(2): regular > special * * p < .05
5. Annotation agreement (e) • overall substantial agreement! К : 0.575 0.966 % : 68.45 99.32 • When looking at % agreement scores: • S > RW > PW * * p < .05 • However, when looking at kappa values: • Always best agreement for S (except for RSL: no sign. diff. OR RW > S * in case of (2)) • No systematic or sign. differences w. r. t. RW and PW: RED: RW < PW * PT: RW > PW * RSL: RW = PW REL: RW > PW (1) or RW = PW (2)
5. Annotation agreement (f) Remarkable finding: Systematic differences in % agreement disappear when looking at kappa values! Explanation: Differences go hand in hand with differences in the amount of errors made: • children coming from special schools make more errors than children coming from a regular school • pseudowords are harder to read than real words, which are again harder to read than words embedded in a text → Kappa is better suited to assess annotation quality
6. Conclusions • The SPACE project • SPeech Algorithms for Clinical and Educational applications • http://www.esat.kuleuven.be/psi/spraak/projects/SPACE • CHOREC • Dutch database of recorded, transcribed, and annotated children’s oral readings • Assessment of annotator agreement • High overall agreement reliable annotations • Kappa better suited to assess annotation quality