10 likes | 153 Views
7. Future Work. 5. Experimental Results. 7. Experimental Results. 6. Conclusion. 3. Grammatical Errors. 4. Automatic Grammar Checking. 1. Summary. 2. The Larger Problem. Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts.
E N D
7. Future Work 5. Experimental Results 7. Experimental Results 6. Conclusion 3. Grammatical Errors 4. Automatic Grammar Checking 1. Summary 2. The Larger Problem Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts The 6th Workshop on Innovative Use of NLP for Building Educational Applications • Performed 10 fold cross validation using naïve Bayes and alternating decision tree classifier from WEKA • Used the alternating decision tree classifier from the WEKA toolkit using rules as features • Grammatical errors are analyzed in child language transcripts • Focus on automatic detection of 6 types of grammatical errors using rule based and statistical systems • Statistical system outperforms rule based systems in most of the cases Khairun-nisa Hassanali and Yang Liu {nisa, yangl}@hlt.utdallas.edu The University of Texas at Dallas • Measuring language development in children • Measures such as Index of Productive Syntax measure language competence but don’t take into account a child’s grammar deficiencies • Automatic grammar checking will allow clinicians to analyze a child’s grammar deficiencies in addition to competence. • Given a child language transcript, answer the following question: • Does the child make more grammatical mistakes than an average Typically Developing (TD) child? • Created rule based systems and statistical systems using 2 sets of features to detect the following 6 types of errors: • Misuse of –ing participle, missing copulae, subject-auxiliary agreement, missing verb, wrong verb usage and missing infinitive marker “To” • Focused on verb related errors since LI children have more problems with verb usage when compared to TD children • Constructed one rule based classifier, alternating decision classifier and naïve Bayes classifier for each error category • Rule based classifiers were constructed using regular expressions based on parse tree structure • Alternating decision tree classifiers used rules as features • Naïve Bayes classifiers used a variety of other features such as bigrams, skip bigrams and other syntactic features depending on the error category • Serially applied all the classifiers to detect grammatical errors • Automatically detected grammatical errors in child language transcripts. • In all cases, we had a recall higher than 84% • Classifiers that used features other than rules performed the best with an F1-measure of 0.967. • LI children made more grammatical mistakes than TD children on most error categories • Used the Paradise Data Set • 677 transcripts (623 TD children, 54 Language Impaired (LI) children) • 108,711 utterances, 394,290 words with a mean length of utterance of 3.64 • Annotated transcripts for 10 types of grammatical errors • Found more LI children made the grammatical mistake at least once compared to TD children. • Use the grammatical errors as features for detecting language impairment • Enhance system to detect other grammatical errors such as missing article • Create a language development score that takes into account grammatical errors made by a child • Take into account dialect specific errors for grammar checking