190 likes | 348 Views
SMS Based FAQ Retrieval Task at FIRE 2011. Danish Contractor, IBM Research India Ankush Mittal, College of Engineering Roorkee Deepak P, IBM Research India L Venkata Subramaniam, IBM Research India. Agenda. Motivation and Overview Dataset Participants Evaluation and Final Scores.
E N D
SMS Based FAQ Retrieval Task at FIRE 2011 Danish Contractor, IBM Research India Ankush Mittal, College of Engineering Roorkee Deepak P, IBM Research India L Venkata Subramaniam, IBM Research India
Agenda • Motivation and Overview • Dataset • Participants • Evaluation and Final Scores
India’s Education Pyramid and Information Access Patterns Internet Users 70 million Mobile Phone Users 800 million The mobile phone is the preferred information device for Indians
SMS Based FAQ Retrieval Task SMS Question FAQ Database • Which insurance policies are available for cancer patients • LIC has some insurance policies for cancer patients • What are the rates for roaming within India • Average Roaming rates on prepaid connections are 60 Paise per minute wht r d policisavlbl 4 cancarpasaints SMS Answer LIC has some insurance policies for cancer patients The goal is to find the Question Q* in the FAQ database that best matches the SMS S
FAQ Retrieval Task Retrieve the best FAQ for a given SMS query SMS L1 FAQ L1 English, Hindi, Malayalam Task 1: Same Language Retrieval SMS L1 FAQ L2 English SMS, HindiFAQ Task 2: Cross Language Retrieval SMS L1/L2/L3 FAQ L1/L2/L3 English/ Hindi/Malayalam SMS/FAQ Task 3: Multi Lingual Retrieval
Details of Dataset • FAQs • Collected from online resources, both govt. and private sector • Three languages: English, Hindi, Malayalam • FAQ Categories • Health • Telecom • Insurance • Railway booking • ………… • SMSes • Collected from mobile savvy college students, online sources and by manually perturbing questions to include common forms of noise-induced variations • Three languages: English, Hindi, Malayalam • Both in domain and out of domain • SMS could match a FAQ in the same language, in another language or not at all
Dataset • Test Data Release (August 2011) • Test SMS: SMSes in three languages • Training Dataset Release (May 2011 and July 2011) • FAQ Dataset: FAQs in three languages • Training SMS: SMSes in three languages Train • Submissions by teams (Sept 2011) • Top 5 FAQs for each SMS Test
Participating Teams • Univ of Iowa (Sanmitra Bhattacharya, Hung Tran and Padmini Srinivasan) • BUAP Mexico (Darnes Vilariño Ayala, David Pinto, Saúl León Silverio, Esteban Castillo and Mireya Tovar Vidal) • DCE Delhi (Arpit Gupta) • IIIT Hyderabad (Aditya Mogadala, Bhupal Reddy and Vasudeva Varma) • DAIICT Gandhinagar (Khushboo Singhal, Smita Kumari and Gaurav Arora) • DTU Delhi (Anwar Shaikh, Rajiv Ratn Shah, Mukul Jain, Mukul Rawat and Manoj Kumar) • Jadhavpur Univ and IPN Mexico (Partha Pakray, Soujanya Poria, Sivaji Bandyopadhyay and Alexander Gelbukh) • DCU Dublin (Deirdre Hogan, Paul Ferguson, Hongyi Wang, Johannes Leveling and Cathal Gurrin) • MSRIT Bangalore (Vinayaka Dj) • TCS Mumbai (Arijit De) • SASTRA Thanjavur (Ashish Raste, Venkata Narasimhan A and Santhosh Bargav) • RVCE Bangalore (Nishit Shivhre) • IIIT Delhi (Tanushree Mishra)
Evaluation • Participants to submit the top 5 FAQs for each SMS • Accuracy and MRR based evaluation
Team-Task Matrix 13 Teams 72 Runs 9 sub-tasks ✔score above median 9
Monolingual Task: English SMS – English FAQ SMS: 728 indomain, 2677 outdomain FAQs: 7251 High Score: 0.83 Median: 0.14 (508,2307) (396,1940) (432,1512) (553,871) (473,118) (506,19) (391,75) (415,0) (0,225) (12,58) (0,29) (0,0) (0,0)
Monolingual Task: Hindi SMS – Hindi FAQ SMS: 200 indomain, 124 outdomain FAQs: 1994 High Score: 0.62 Median: 0.53 (198,3) (111,80) (186,0) (171,2) (165,0) (153,0) (0,119)
Monolingual Task: Malayalam SMS – Malayalam FAQ (47,0) SMS: 50 indomain, 0 outdomain FAQs: 681 High Score: 0.94 Median: 0.90 (46,0) (44,0) (39,2)
Crosslingual Task: English SMS – Hindi FAQ SMS: 37 indomain, 3368 outdomain FAQs: 1994 High Score: 0.65 Median: 0.0499 (2,2206) (5,182) (0,170) (4,159) (2,40)
Multilingual: English SMS – English/Hindi/Malayalam FAQ SMS: 724 indomain, 2681 outdomain FAQs: 9926 High Score: 0.52 Median: 0.15 (424,1336) (504,17) (356,25)
Multilingual: Hindi SMS – English/Hindi/Malayalam FAQ SMS: 200 indomain, 124 outdomain FAQs: 9926 High Score: 0.57 Median: 0.51 (103,83) (165,0) (113,0)
Multilingual: Malayalam SMS – English/Hindi/Malayalam FAQ SMS: 50 indomain, 0 outdomain FAQs: 9926 High Score: 0.88 (44,0) (32,0)
Concluding Remarks • The mobile phone is the preferred Information Device for Indians • SMS is the preferred mode • The FAQ Retrieval task encourages research in building systems that enable accessing of information from FAQ databases using SMS queries • The results are encouraging
Thank You! 18