1 / 13

INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions

INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions. Alejandro Reyes-Barragán Luis Villaseñor-Pineda Manuel Montes-y-Gómez Laboratory of Language Technologies National Institute of Astrophysics, Optics and Electronics Tonantzintla, Mexico

brock
Download Presentation

INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INAOE at QAST 2009:Evaluating the usefulness of aphonetic codification of transcriptions Alejandro Reyes-Barragán Luis Villaseñor-Pineda Manuel Montes-y-Gómez Laboratory of Language TechnologiesNational Institute of Astrophysics, Optics and Electronics Tonantzintla, Mexico mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

  2. Our previous work We worked in the task of spoken document retrieval. Search relevant information for general user queries from a collection of automatic transcriptions of speech. Main challenge is to reduce the impact of transcription errors in the retrieval accuracy. Current Automatic Speech Recognition (ASR) systems have error rates that vary from 20% to 40%.

  3. Our idea for SDR • A new document representation based on a phonetic codification of automatic transcriptions. • Characterize words with similar pronunciations through the same phonetic code • Use Soundex codes to enrich the representation of transcriptions. • An example: • Unix Sun Workstation → (U52000 S30000 W62300) • Unique some workstation → (U52000 S30000 W62300)

  4. The enriched representation • We eliminated stop words and high frequency phonetic codes from the enriched representation. • Queries were represented in the same way • Actions of Raoul Wallenberg • {actions, raoul, wallenberg, A23520, R40000, W45162}

  5. Results in SDR • Comparing our results against those from the English monolingual task of CL-SR 2007

  6. Our participation at QAST 2009 • An extension of our previous work • Aimed to evaluate the usefulness of employinga phonetic codification in the task of QA in speech transcriptions. • Our goal was to improve the retrieval of relevant passages for each question, and, therefore, the final answer accuracy • We applied very simple techniques for question classification and answer extraction.

  7. Architecture of our system

  8. Evaluation of passage retrieval(Questions having the answer in the first five passages) Task T1a – Written questions Task T1a – Spontaneous Oral questions

  9. Results for manual transcriptions • In TASK T1a, the inclusion of phonetic information was not really advantageous, it only produced a slightly improvement. • In TASK T1b, where questions were transcriptions, it was possible to observe an improvement by using the phonetic codification.

  10. Results for automatic transcriptions

  11. Preliminary Conclusions Results indicate that phonetic codes had, in general, no impact on the answer accuracy. Given that phonetic codes improved the passage retrieval, we may conclude that our answer extraction method is inadequate. We obtained better results using manual transcriptions because the NER was accurate.

  12. Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

  13. Soundex codification • Capitalize all letters in the word and drop all punctuation marks. • Retain the first letter of the word. • Change all occurrence of the following letters to '0' (zero):  'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'. • Change letters from the following sets into the given digit: • 1 = 'B', 'F', 'P', 'V' • 2 = 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' • 3 = 'D','T' • 4 = 'L' • 5 = 'M','N' • 6 = 'R' • Remove all pairs of equal digits occurring beside each other from the string resulted after step (4). • Remove all zeros from the string that results from step (5) • Pad the string resulted from step (6) with trailing zeros and return only the first six positions. The output code will be of the form <uppercase letter> <digit> <digit> <digit> <digit> <digit>.

More Related