210 likes | 362 Views
A Report on the First Native Language Identification Shared Task. Joel Tetreault Nuance Communications Daniel Blanchard Educational Testing Service Aoife Cahill Educational Testing Service. Native Language Identification.
E N D
A Report on the First Native Language Identification Shared Task Joel Tetreault Nuance Communications Daniel Blanchard Educational Testing Service Aoife Cahill Educational Testing Service
Native Language Identification • Task of automatically identifying a speaker’s first language based solely on the speaker’s writing in another language • Applications: • Authorship profiling (Estival et al., 2007) • Education: more targeted feedback to language learners (Leacock et al., 2010)
Sample Essay 1 No risk no fun I agree the statement "Successful people try new things and take risk".In my mind it is so, to. When you thing you like do new stuff you need a liddelbit the kick. That is the big point what I need. For exsample I like to go to a big city like New York. I was never in this town I dont no from the city. But I like go to the city. Thats fun I stay every time for proplems. I need eat a hood offer my head. The ather side I can go dow. I dont gat waht I need…Next exsample the wall street you put money in funds, well you this make a good job. Dont for get the risk look like lose money. German
Sample Essay 2 For example, if you take a look at an ordinary school, you have different teachers for every subject. Your calculus teacher is different than your literature teacher. Each teacher must specialize in a specific subject in order to convey suffiecient and proper information to the students. However, that doesn't mean that the teacher is narrow-minded and has a limited perspective in life because to specialize in one subject doesn't hinder you or stop you from exploring other subjects. Arabic
Motivation • Lots of work in NLI but…it has been hard to compare different approaches: • ICLEv2 (Granger et al, 2009): de facto train/test data is small and has NLI-unfriendly idiosyncrasies • No consensus on evaluation: • Which L1’s / how many L1’s? • Train/test splits? • Best features?
Contributions • Goal to unify community and help field progress • Provide a larger, more NLI-friendly corpus that improves upon ICLEv2 • Common evaluation framework • Everyone evaluates using same train/dev/test splits and same L1s • Corpus and scripts to be made public to further promote the field
Outline • Prior Work • Data • Shared Task Overview • Results • NLI Shared Task in the Future
Prior Work • Treat NLI as a classification task • Koppel et al. (2005): POS n-grams, content and function words, spelling and grammatical errors • Syntactic features (Wong and Dras, 2011) • Tree Substitution Grammars (Swanson and Charniak, 2012) • Adaptor Grammars (Wong et al., 2012) • Data Size Effects (Brooke and Hirst, 2012) • Word n-grams (Bykh and Meurers, 2012): • LMs and Ensemble Classifiers (Tetreault et al., 2012)
Data: TOEFL11 Corpus • 12,100 essays from the ETS Test of English as a Foreign Language (TOEFL) • 11 L1s: • Arabic, Chinese French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, Turkish • 900 train / 100 dev / 100 test • Sampled for equal representation of L1s across topics as much as possible • Includes 3-tier proficiency level • Public release via LDC this summer?
Shared Task Description: 3 Sub-tasks • Closed-Training: 11-way classification task using only TOEFL11-TRAIN and DEV • Open-Training-1: use of any amount or type of training data excluding TOEFL11 • Open-Training-2: use of any amount or type of training data combined with TOEFL11 * All sub-tasks use TOEFL11-TEST for the final evaluation set
Shared Task Description • Each team allowed to submit up to 5 different systems per task • Teams submitted a CSV file for each system to NLI Organizers • Evaluation script automatically compares each prediction file to gold standard and creates performance report and contingency tables
Closed Sub-Task • See Table 3 of Report for full results • No statistically significant differences between top 5 teams
Open Sub-tasks • Challenge : finding new data to cover each L1 • Data sources for HIN & TEL: • ICNALE Pakistani essays HIN (TUE team) • Bilingual blogs (TOR & TUE team)
Discussion of Approaches • Machine Learning • SVM overwhelmingly the most popular approach • 4 teams also tried Ensemble classifiers • String kernels (BUC) using character level n-grams
Discussion of Approaches • Features • N-grams: word, POS, character, function • Syntactic Features: Dependencies, TSG, CF Productions, Adaptor Grammars • Spelling Features • 4 of top 5 teams used n-grams at least 4-grams, some went up to 9-grams • 2 of top 10 teams used syntactic features
Future of NLI Shared Task • Ideas to expand scope of task • Use a new set of TOEFL essays for test • Expand genres: blogs? Tweets? • Number of L1s • Do different L2 • ItaliaNLP – preparing Italian NLI corpus with CNR Pisa • Also a corpus of Finnish with L1 (Turku Uni) • Add slavic languages • Logistics • Hold another shared task in 2014? Or 2015? • Merge with PAN Shared Task? • Tell us your thoughts!
Acknowledgments • Derrick Higgins (ETS) • ETS TOEFL • Patrick Houghton (ETS) • BEA8 Organizers • All the NLI Participants!
Questions?nlisharedtask2013@gmail.comhttp://www.nlisharedtask2013.org/Questions?nlisharedtask2013@gmail.comhttp://www.nlisharedtask2013.org/