INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions

INAOE at QAST 2009:Evaluating the usefulness of aphonetic codification of transcriptions Alejandro Reyes-Barragán Luis Villaseñor-Pineda Manuel Montes-y-Gómez Laboratory of Language TechnologiesNational Institute of Astrophysics, Optics and Electronics Tonantzintla, Mexico mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

Our previous work We worked in the task of spoken document retrieval. Search relevant information for general user queries from a collection of automatic transcriptions of speech. Main challenge is to reduce the impact of transcription errors in the retrieval accuracy. Current Automatic Speech Recognition (ASR) systems have error rates that vary from 20% to 40%.

Our idea for SDR • A new document representation based on a phonetic codification of automatic transcriptions. • Characterize words with similar pronunciations through the same phonetic code • Use Soundex codes to enrich the representation of transcriptions. • An example: • Unix Sun Workstation → (U52000 S30000 W62300) • Unique some workstation → (U52000 S30000 W62300)

The enriched representation • We eliminated stop words and high frequency phonetic codes from the enriched representation. • Queries were represented in the same way • Actions of Raoul Wallenberg • {actions, raoul, wallenberg, A23520, R40000, W45162}

Results in SDR • Comparing our results against those from the English monolingual task of CL-SR 2007

Our participation at QAST 2009 • An extension of our previous work • Aimed to evaluate the usefulness of employinga phonetic codification in the task of QA in speech transcriptions. • Our goal was to improve the retrieval of relevant passages for each question, and, therefore, the final answer accuracy • We applied very simple techniques for question classification and answer extraction.

Architecture of our system

Evaluation of passage retrieval(Questions having the answer in the first five passages) Task T1a – Written questions Task T1a – Spontaneous Oral questions

Results for manual transcriptions • In TASK T1a, the inclusion of phonetic information was not really advantageous, it only produced a slightly improvement. • In TASK T1b, where questions were transcriptions, it was possible to observe an improvement by using the phonetic codification.

Results for automatic transcriptions

Preliminary Conclusions Results indicate that phonetic codes had, in general, no impact on the answer accuracy. Given that phonetic codes improved the passage retrieval, we may conclude that our answer extraction method is inadequate. We obtained better results using manual transcriptions because the NER was accurate.

Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

Soundex codification • Capitalize all letters in the word and drop all punctuation marks. • Retain the first letter of the word. • Change all occurrence of the following letters to '0' (zero): 'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'. • Change letters from the following sets into the given digit: • 1 = 'B', 'F', 'P', 'V' • 2 = 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' • 3 = 'D','T' • 4 = 'L' • 5 = 'M','N' • 6 = 'R' • Remove all pairs of equal digits occurring beside each other from the string resulted after step (4). • Remove all zeros from the string that results from step (5) • Pad the string resulted from step (6) with trailing zeros and return only the first six positions. The output code will be of the form <uppercase letter> <digit> <digit> <digit> <digit> <digit>.

INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions

INAOE at QAST 2009: Evaluating the usefulness of a phonetic codification of transcriptions

Presentation Transcript

Taking the measure of phonetic structure

Usefulness of BMD Testing

Evaluating the usefulness of human health databases in the surveillance of zoonotic, enteric disease in Alberta

The Usefulness of “Dark Humor ”

Evaluating the Usefulness of Watchdogs fo r Intrusion D etection in VANETs

SEL3053: Analyzing Geordie Lecture 6. The TLS / DECTE phonetic transcriptions

PHONETIC TRANSCRIPTIONS

Codification of Flip Chip Knowledge

CODIFICATION OF SPORT

Evaluating EHR at the Point of Care

Increasing the Usefulness of a Mesocyclone Climatology

Overview of QAST 2008 - Question Answering on Speech Transcriptions -

Overview of QAST 2007 - Question Answering on Speech Transcriptions -

Proposal for QAST 2008 lsi.upc/~qast

Applications of Phonetic Theory

The Usefulness of Acupuncture

Steal a Look at the Usefulness of Hygienic Plastic Disposables

Taking the measure of phonetic structure

Usefulness of Explanations

The usefulness of Physio-Control Defibrillator

The Usefulness of Shrikhand – Explained

The Usefulness Of Looking At Genuineness When Acquiring Youtube Comments