80 likes | 198 Views
Answering Spanish Questions from English Documents. Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information Sciences Institute. Outline. Development collection Choosing a cross-language approach TextMap-TMT architecture What did we learn?. Cross-Language QA.
E N D
Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information Sciences Institute CLEF 2003
Outline • Development collection • Choosing a cross-language approach • TextMap-TMT architecture • What did we learn?
Cross-Language QA • Evaluation conditions: • English documents (LA Times) • 200 questions (we chose Spanish) • Exact answers • ISI development collection: • English documents (TREC-2003 QA track) • 100 Spanish questions (translated, from TREC) • Answer patterns
Design Space • Architecture • Question translation + English QA • Document translation + Spanish QA • Mix of language-specific + translation components • Translation approaches • Statistical MT, trained on European Parliament • Transfer-method MT (Systran on the Web) • Human translation (as an upper bound)
TextMap-TMT Architecture DATE ISLAND BASEBALL-SPORTS-TEAM cuanto-> whichever whichever-> “how many”” [“Alaska became a state on”] OR [Alaska became a state in”] OR … [Alaska (49)] [state (7)] [{January 3 1959} OR {1867} OR {1959} (3)] Optional
Official Results Top-1, Validated Top-1, Not validated Top-3, Not validated
Lessons Learned • Cross-Language QA is a tractable problem • Better than 25% @ top 1, almost 40% @ top 3! • Our best MT systems are statistical • But our best QA systems are heavily rule-based • Virtually every component needs to be redone • As complex as making a new monolingual system • Strong synergy with CLIR is possible • Web search, local collection search