1 / 10

CAS-IA System Description

CAS-IA System Description. Jinhua Du CNGL July 23, 2008. Outline. Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions. Hardware. Machines Parallel Computing Condor Grid Computing Module developed by ASR group. Pre-process & Data.

zoltan
Download Presentation

CAS-IA System Description

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CAS-IA System Description Jinhua Du CNGL July 23, 2008

  2. Outline • Hardware in IA • Pre-process & Data • MT System Configuration for Evaluation • Achievements • Conclusions

  3. Hardware • Machines • Parallel Computing • Condor • Grid Computing Module developed by ASR group

  4. Pre-process & Data • Pre-processing • encoding conversion & filter • punctuation and number conversion (full-shaped -> half-shaped, etc.) • case conversion (only the initial alphabet of the initial word), abbreviation processing • Chinese word segment (ICT or IA tool), English tokenization • Data for NIST • Parallel: 3.4 M (if adds UN corpus, up to 10M) • Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M • Data for IWSLT • Parallel: BTEC(20K or 40K); LDC • Monolingual: BTEC; Gigaword • Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)

  5. System Configuration • Modules • Pre-processing • Alignment Post-preprocessing & Models Generation • Decoding & MER Training • System Combination & Post-Processing

  6. Achievements (zh-en) • The 3rd MT Symposia in China ( rank 3) • Limited (830K pairs) • Unlimited (3M pairs)

  7. Achievements (zh-en) • NIST MT Eval. 2008

  8. Achievements (zh-en) • IWSLT2008 • More systems to be combined • 2 PB systems developed by CASIA • Moses • SAMT (CMU) • Hierarchical PB • BTG-based system (Xiong) • Better performance

  9. Conclusions • More better data, better performance • System combination is very helpful to improve the performance • Evaluation is different from theoretical research: empirical methods and tricks are usually more effective • For better rank, should be prepare in advance and build a temporarily team for evaluation • Evaluation is a horrible thing for student: more time, more energy and no paper (joke but true) • Develop systems for application purpose

  10. Thanks

More Related