Detecting Missing Hyphens in Learner Text

Detecting Missing Hyphensin Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service Martin Chodorow Hunter College and the Graduate Center ACL 2013

Outline • Introduction • Baselines • System Description • Evaluation • Conclusions

Introduction Missing Hyphens: • Schools may have more after school sports. • (2) I went to the dentist after school today. • (3) My father like play basketball with me.

Outline Introduction Baselines System Description Evaluation Conclusions

Baselines • Collins Dictionary • More than 1,000 times in Wikipedia • Probability of the hyphenated form as estimated from • Wikipedia is greater than 0.66

System Description Learner text: Schools may have more after school sports.

System Description Model: Logistic regression model Probability: Only predict a missing hyphen error when the probability of the prediction is >0.99

System Description SJM-trained: - San Jose Mercury News corpus - For training, hyphenated words are automatically split (i.e. well-known becomes well known) - The training data contains 1% of thepositive examples and 3% of thenegative examples

System Description Negative examples selected: Only contexts that occur more than 20 times are selected during training.

System Description Wiki-revision-trained: - Wikipedia articles

System Description

System Description Combined: - Combine both data sources

Evaluation • Artificial Data: • - Brown corpus • - taking 24,243 sentences • - 2,072 hyphenated words

Evaluation

Evaluation Evaluation 1 • Learner Text: • - CLC-FCE • - The corpus contains 1,244 exam scripts • - Totally 173 instances of missing hyphen errors

Evaluation

Evaluation There are 131 true positives for the learner data reveal that 87 of these are cases of a single type, the word “make-up”.

Evaluation Evaluation 2 • Learner Text: • - A data set of 1,000 student GRE and TOEFL essays • - Drawn from 295 prompts • - Ranged in length from 1 to 50 sentences • - Average of 378 words per essay

Evaluation Learner Text (Cont.): - Manually inspect a random sample of 100 instances where each system detected a missing hyphen - Twonative-English speakers judge - Using the Chicago Manual of Style as a guide - High agreement

Evaluation

Conclusions 1 ) Automatically detecting missing hyphen errors in learner text 2 ) The classifiers generally performed better than the baseline systems 3 ) Taking context into account when detecting the errors is important.

Detecting Missing Hyphens in Learner Text

Detecting Missing Hyphens in Learner Text

Presentation Transcript

Hyphens:

Hyphens

Detecting flames and insults in text

hyphens

Apostrophes/Hyphens

Homophones and Hyphens

Hyphens

Hyphens

Dashes vs. Hyphens

Hyphens (-)

HYPHENS

HYPHENS

Using Hyphens

Is your text messaging missing you in its life?

Detecting active subnetworks in interaction graphs with missing data

Detecting Terrorist Activities via Text Analytics

HYPHENS

Detecting Missing Hyphens in Learner Text

Detecting active subnetworks in metabolic interaction graphs with missing data

Hyphens!

Detecting active subnetworks in molecular interaction networks with missing data