250 likes | 397 Views
Information Extraction from Spoken Language. Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS. PUT RAW DATA NOW and then LINK DATA. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html. PUT RAW DATA NOW. Text Data (numbers, statistics)
E N D
Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS
PUT RAW DATA NOW and then LINK DATA • http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
PUT RAW DATA NOW • Text • Data (numbers, statistics) • Data (audio, video)
LINKED DATA • Information is in the relationship between data • Find relationship between them
Proposal • Information Extraction in radio and television documents • Industrial Partners: • CEDROM Sni • Irosoft • Universities and Research Center • CRIM • ÉTS • INRS-EMT • McGill • NSERC Strategic Project Proposal
Process Raw Audio Data • Automatic Speech Recognition (ASR) • Parsing • Indexation ASR Parsing Indexation
Closed-captioning / Subtitling VOICEWRITER
Closed- captioning / Subtitling • Done with the help of a VoiceWriter that: • Respeaks • Adds punctuation • Selects proper dictionary • Does not speak during advertising • Wraps up information when more than one speakers speak in the same time or when the speech rate is too fast. • Translates
How to process raw audio data? Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Topic Recognition ASR Parsing Indexation
Audio Diarization • Aims to segment an audio recording into acoustically homogeneous parts • Distinguish between speech and music • Distinguish between advertising and news
Speaker diarization • Aims to segment a speech signal into its speech turns
Speaker Role • In broadcast news speech, most speech is from anchors and reporters. The remaining is from excerpts from quotations or interviews and are referred as sound bites. • Detecting speaker role is important to improve: • acoustice speech recognizer • information extraction
Punctuation • Some language analysis tasks such as parsing and entity extraction needs punctuations (dots and commas) in order to work properly.
Structural Segmentation • Sentence segmentation, paragraph segmentation, story segmentation are important features for speech understanding applications from parsing and information extraction at the basic level. • This problem is absent in text processing but has to be solved in speech processing.
Topic Spotting • Aims to identify the topic of a speech signal. It is useful to adapt the different components of the system as well as to add metatag on a speech signal. • Example: La belle ferme le voile • La: the, her • Belle: beautiful, beauty • Ferme: farm, closes • Le: the, his • Voile: veil, blocks the view • Two hypothetic translations: • The veil is closed by the beauty • The beautiful farm blocks his view
How to improve Information Extraction from speech?By improving ASR Components
Automatic Speech Recognizer • Performance drops when • Out-of-vocabulary (Lexical models) • Multiple users (Acoustic models) • Multiple microphones (Acoustic models) • Multiple topics (Language models) • Cross-over talks (All models)
How to improve Information Extraction from speech? • More data are better data. • More similar data are better data. Similar in terms of • Topic • Coming from the same time period. Specifically, more recent. • Example: Japan • Prediction of what will happen and who will speaks.
More data are better data • Use of the huge amount of web information • Use super computer infrastructure in order to model it in a reasonable time: • Compute Canada infrastructure: CLUMEQ • Cluster of university computers
More similar data are better data • Exploiting redundancies in different media information: • Anchor speech is predominant. • Reporters often appear at specific times, day after day • Advertisings appear (and repeat) near specific time slot, day after day. • The same news is often reused from one media to another.
And then …. Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Topic Recognition ASR Parsing Indexation