100 likes | 213 Views
CAS-IA System Description. Jinhua Du CNGL July 23, 2008. Outline. Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions. Hardware. Machines Parallel Computing Condor Grid Computing Module developed by ASR group. Pre-process & Data.
E N D
CAS-IA System Description Jinhua Du CNGL July 23, 2008
Outline • Hardware in IA • Pre-process & Data • MT System Configuration for Evaluation • Achievements • Conclusions
Hardware • Machines • Parallel Computing • Condor • Grid Computing Module developed by ASR group
Pre-process & Data • Pre-processing • encoding conversion & filter • punctuation and number conversion (full-shaped -> half-shaped, etc.) • case conversion (only the initial alphabet of the initial word), abbreviation processing • Chinese word segment (ICT or IA tool), English tokenization • Data for NIST • Parallel: 3.4 M (if adds UN corpus, up to 10M) • Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M • Data for IWSLT • Parallel: BTEC(20K or 40K); LDC • Monolingual: BTEC; Gigaword • Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)
System Configuration • Modules • Pre-processing • Alignment Post-preprocessing & Models Generation • Decoding & MER Training • System Combination & Post-Processing
Achievements (zh-en) • The 3rd MT Symposia in China ( rank 3) • Limited (830K pairs) • Unlimited (3M pairs)
Achievements (zh-en) • NIST MT Eval. 2008
Achievements (zh-en) • IWSLT2008 • More systems to be combined • 2 PB systems developed by CASIA • Moses • SAMT (CMU) • Hierarchical PB • BTG-based system (Xiong) • Better performance
Conclusions • More better data, better performance • System combination is very helpful to improve the performance • Evaluation is different from theoretical research: empirical methods and tricks are usually more effective • For better rank, should be prepare in advance and build a temporarily team for evaluation • Evaluation is a horrible thing for student: more time, more energy and no paper (joke but true) • Develop systems for application purpose