130 likes | 238 Views
INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions. Alejandro Reyes-Barragán Luis Villaseñor-Pineda Manuel Montes-y-Gómez Laboratory of Language Technologies National Institute of Astrophysics, Optics and Electronics Tonantzintla, Mexico
E N D
INAOE at QAST 2009:Evaluating the usefulness of aphonetic codification of transcriptions Alejandro Reyes-Barragán Luis Villaseñor-Pineda Manuel Montes-y-Gómez Laboratory of Language TechnologiesNational Institute of Astrophysics, Optics and Electronics Tonantzintla, Mexico mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg
Our previous work We worked in the task of spoken document retrieval. Search relevant information for general user queries from a collection of automatic transcriptions of speech. Main challenge is to reduce the impact of transcription errors in the retrieval accuracy. Current Automatic Speech Recognition (ASR) systems have error rates that vary from 20% to 40%.
Our idea for SDR • A new document representation based on a phonetic codification of automatic transcriptions. • Characterize words with similar pronunciations through the same phonetic code • Use Soundex codes to enrich the representation of transcriptions. • An example: • Unix Sun Workstation → (U52000 S30000 W62300) • Unique some workstation → (U52000 S30000 W62300)
The enriched representation • We eliminated stop words and high frequency phonetic codes from the enriched representation. • Queries were represented in the same way • Actions of Raoul Wallenberg • {actions, raoul, wallenberg, A23520, R40000, W45162}
Results in SDR • Comparing our results against those from the English monolingual task of CL-SR 2007
Our participation at QAST 2009 • An extension of our previous work • Aimed to evaluate the usefulness of employinga phonetic codification in the task of QA in speech transcriptions. • Our goal was to improve the retrieval of relevant passages for each question, and, therefore, the final answer accuracy • We applied very simple techniques for question classification and answer extraction.
Evaluation of passage retrieval(Questions having the answer in the first five passages) Task T1a – Written questions Task T1a – Spontaneous Oral questions
Results for manual transcriptions • In TASK T1a, the inclusion of phonetic information was not really advantageous, it only produced a slightly improvement. • In TASK T1b, where questions were transcriptions, it was possible to observe an improvement by using the phonetic codification.
Preliminary Conclusions Results indicate that phonetic codes had, in general, no impact on the answer accuracy. Given that phonetic codes improved the passage retrieval, we may conclude that our answer extraction method is inadequate. We obtained better results using manual transcriptions because the NER was accurate.
Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg
Soundex codification • Capitalize all letters in the word and drop all punctuation marks. • Retain the first letter of the word. • Change all occurrence of the following letters to '0' (zero): 'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'. • Change letters from the following sets into the given digit: • 1 = 'B', 'F', 'P', 'V' • 2 = 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' • 3 = 'D','T' • 4 = 'L' • 5 = 'M','N' • 6 = 'R' • Remove all pairs of equal digits occurring beside each other from the string resulted after step (4). • Remove all zeros from the string that results from step (5) • Pad the string resulted from step (6) with trailing zeros and return only the first six positions. The output code will be of the form <uppercase letter> <digit> <digit> <digit> <digit> <digit>.