420 likes | 600 Views
Recent Advances in Speech Translation Systems. Data Collection in Nespole!. Goals, procedures and tools. Susanne Burger (Carnegie Mellon University) Erica Costantini (University of Trieste). New Idea. Why data collection?. Learning by Data. Speech Material:
E N D
Recent Advances in Speech Translation Systems Data Collection in Nespole! Goals, procedures and tools Susanne Burger (Carnegie Mellon University) Erica Costantini (University of Trieste)
New Idea Why data collection? Learning by Data • Speech Material: • Domain, concept, vocabulary • Style (Human machine conversation) • Quality (Robustness) ... • Information about Users: • Acceptance • Usage • Behavior • Wish-list • Problem solving ... • System Information (Dry Run): • Stability • Speed • Bugs ... J.T. Hackos, J.C. Redish, User and Task Analysis for interface design, J. Wiley & Sons, 1998.
Mass-Data from the scratch Artificial Scenario/Environment/Set upWizard of OzCooperative User/Actor 1 2 AnalysisDevelopmentTrainingTestingEvaluation Data Corpus Data collection through usage of beta-system with increasing reality User-study Data Beta-System Learning by Data
Data Collection: Planning • Who are the “Data Customers”?Nespole!: • ASR • MT • Synthesis • Interface Development • ... • Customer Needs? • Nespole!: • Audio / Video • Transcription (levels of transcription) • Segmentation • Data Usage? • Nespole!: • Analysis • Development • Training • Testing • Evaluation • Type of Collection? • Nespole!: • Mass Data Collection • Specific features • User study Time and Budget
IDEA: NEgotiation through SPOken Language in E-commerce Mass-Data Collection: Showcase 1 Travel Scenario / H323 Set up Monolingual Cooperative Users AnalysisDevelopmentTrainingTestingEvaluation Data Corpus Travel + Multimodality Beta System MTUnseen Users Multimodal Experiment NespoleShowcase1-System Nespole! Data Collection
Scen./Topic Recording Data Participants Environment Equipment Data Collection Procedure Example: Mass-Data Collection (Showcase 1) Monolingual data collection for system development “Assembling Line”
Scen./Topic Recording Data Participants Environment Equipment
Scenarios • Scenario: “story” about users, their work, their environment, how they do tasks, the task they need to do, and all combinations of these elements (*). • Scenario in Nespole! Detailed description of: • the customers’ features (age, marital status…); • the destination of the travel; • the objectives and preferences for the holiday (accommodation, sport activities, cultural events…) J. M. Carroll, Ed., Scenario-Based Design: Envisioning Work and Technology in System Development, New York, J. Wiley & Sons, 1995.
Scenario example Situation (Winter Holidays in Val di Fiemme): • choose your vacation starting date after December 10th you want to stay there for (a weekend, 1 week, 2 weeks) • you have 2 children (choose 2 ages between 2 and 11) and wife/husband • you want to travel by car and park it at the hotel • you already know the road to Val di Fiemme • you want accommodation in ** or *** hotels in Val di Fiemme with bed & breakfast • choose two hotels among: Latemar in Molina, Bellavista in Cavalese, Excelsior in Cavalese, Lagorai in Cavalese, Belvedere in Panchia, Bellaria in Predazzo, Cimon in Predazzo, Erica in Tesero, Lucia in Tesero, Montanara in Ziano, Zanon in Ziano • you want to practice a winter sport (choose your favorite winter sport among the following: down hill skiing, cross-country skiing/snowshoeing, ice skating, snow-boarding)
Scenario example Things to ask for: • prices and how far in advance to book • types of ski-lifts nearby and their distance from hotel • existence of cross-country trails and ice skating areas • details about favorite winter-sport (exact location, prices, possibility of renting equipment) • type of parking facilities for the car • possibility of eating in the hotel and prices of dinner and late supper • daycare and activities for children in the hotel • special prices for children
Scenario definition in Nespole! Example: Showcase 1 • analysis of 5000 e-mail messages (in four languages); • clustering of the e-mails on the base of the request type; • selection e-mails concerning requests which could be discussed through phone call; • construction of 21 scenarios; • selection of 5 scenarios* among the 21 (done by the APT tourist board office manager) * http://www.is.cs.cmu.edu/nespole/datacoll.html
Scen./Topic Recording Data Participants Environment Equipment
Participants CUSTOMERS: AGENTS: Italian professional agents working at Trentino tourist office APT
Scen./Topic Recording Data Participants Environment Equipment
File .wav (stereo) File .wav (stereo) Environment File .wav (stereo) H323 Agent Eng. Customer (local) File .wav (stereo) H323 Eng. customer Agent (local) • APT (agent’s site, Italy) records the English client via H323 connection and the Italian agent via headset • CMU (client’s site, USA) records the Italian agent via H323 connection and the English client via headset
Scen./Topic Recording Data Participants Environment Equipment
Hardware: PC Pentium 200 and up Software: Windows NT or Win 98 Total Recorder NetMeeting3.01 Microphone: Headset or close microphone Environment: Quiet office Equipment
Scen./Topic Recording Data Participants Environment Equipment
Recording:LTI Data Collection Database Oracle database, accessible online, containing detailed information and descriptions about meetings recorded, demographics of the speakers, transcriptions and audio files (currently two separate interfaces to enter data into and retrieve data from the database)
Scen./Topic Recording Data Participants Environment Equipment
2 stereo wav files Spr protocol Rpr protocol video tapes (200 collected dialogues )
Example from Nespole! file naming conventions File naming conventions Confusion with parallel recordings; different types of files concerning the same recording; different languages, types of scenario, locations; stereo vs mono files, etc. Why?
Audio Data ... m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but . m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything . m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% . m054_3_0578_AAH_00: <hm> m054_5_0579_MTY_00: right . <B> m054_4_0580_ZMW_00: so , <B> this... Transcriptionprocess TranscriptionConventions Transcription Tool TRL FilesMAR FilesVoc Lists
Audio Data ... m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but . m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything . m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% . m054_3_0578_AAH_00: <hm> m054_5_0579_MTY_00: right . <B> m054_4_0580_ZMW_00: so , <B> this... Transcriptionprocess TranscriptionConventions Transcription Tool TRL FilesMAR FilesVoc Lists
Transcription (trl) Conventions • Verbmobil II: • - we are familiar with VMB and we have appropriate tools • - BAS partitur format • - finite/close system (parsing, filtering,converting) • - line oriented, no formats (one line/turn) • - turn oriented (turn-IDs contain full identification) - time stamps and trl are in different files linked by turn-ID • (- http://www.is.cs.cmu.edu/trl_conventions/) S. Burger, L. Besacier, P. Coletti, F. Metze and C. Morel, “The NESPOLE! VoIP Dialogue Database”, in Proc. of Eurospeech 2001. Aalborg, Denmark.
Content -words Orthography: - orthographic rules as long as they are non-ambiguous- no capitalization in case of initial sentence position - vocabulary lists to keep vocabulary spelled the same • word tags • non-grammatical phrases • broken words • interrupted words • acoustically hard to understand • pauses and breathing • filled pauses • acoustically not understandable • human noise • elements -rules • capitalization • punctuation • white space • turn-end • syntax
<*tENG> Foreign Language Turn (JAP, GER, ..) ;.. global Comment ..'.. Apostrophe (reduced word) ..-.. (--) Hyphen (compound word) $.. spelled Letter ~..Name #.. Number *.. Neologism/Mispronunciation <*XXX.. Foreign Word (FRA,ITA, ..) ...<L>.. / ..<Z>.. Lengthening ..% Poor intelligible ..= Articulated Break-off .._ Interruption of a Word, Left Fragment _.. Interruption of a Word, Right Fragment <T_>.. Technical Interruption of a Word, Beginning ..<_T> Technical Interruption of a Word, End <*T> Technical interruption of a Turn <*T>t Technical Break-off of a Turn <!n ..> Comment on Pronunciation . / ? / , Punctuation +/.. Beginning of a Repetition/Correction ../+ End of a Repetition/Correction -/.. Beginning of a False Start ../- End of a False Start <B> / <A> Respiration <uh> / <"ah> Filled Pause (Hesitation) <uhm> / <"ahm> Filled Pause (Hesitation) <hm> Filled Pause (Hesitation) <hes> / <h"as> Filled Pause (Hesitation) <%> Unidentifiable Sound Production <Smack> / <Schmatzen> Nonverbal Artikulatory Sound (sound: smacking) <Swallow> / <Schlucken> Nonverbal Artikulatory Sound (sound: swallowing) <Throat> / <R"auspern> Nonverbal Artikulatory Sound (sound: clearing one's throat) <Cough> / <Husten> Nonverbal Artikulatory Sound (sound: cough) <Laugh> / <Lachen> Nonverbal Artikulatory Sound (sound: laughing) <Noise> / <Ger"ausch> Nonverbal Artikulatory Sound (other sounds) <#Click> / <#Klicken> Technical Noise <#Ring> / <#Klingeln> Technical Noise <#Knock> / <#Klopfen> Technical Noise <#Mtouch> / <#Mikrobe> Technical Noise <#Mwind> / <#Mikrowind> Technical Noise <#Rustle> / <#Rascheln> Technical Noise <#Squeak> / <#Quietschen> Technical Noise <#> Technical Noise <P> Pause during Speech @n.. Active Interference by a Speaker ..n@ Passively Interfered Speaker <@n.. Active Interference by Acoustic Events ..n@> Passive Interference of Acoustic Events <:<..> .. Beginning of Noise Interference ..:> End of Noise Interference <;..> Local Comment !KEY!.. Code Word <PP> Scenario Caused Pause
Audio Data ... m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but . m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything . m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% . m054_3_0578_AAH_00: <hm> m054_5_0579_MTY_00: right . <B> m054_4_0580_ZMW_00: so , <B> this... Transcriptionprocess TranscriptionConventions Transcription Tool TRL FilesMAR FilesVoc Lists
Why another tool? Other requirements as before: - Windows instead of Linux - Meetings – multiparty transcription - Transcriber from different backgrounds At that time (over three years ago) there wasn’t a sufficient transcriber tool Transcription Tools • We did a study what would be the basic requirements. • We asked transcribers what they would find convenient. • We programmed a beta tool according to that. • We are still using this tool (and so do different other places in the mean time) • We call it TransEdit.
TransEdit:transcription tool just for transcribers • MFC program • Windows text editor • click-able buttons for transcription elements • automatic turn naming and counting • label editor • parallel display of multi audio signals • easy turn segmentation • lots of listen functions • easy handling, no research functions • “home work” but available for universities • (write to: sburger@cs.cmu.edu)
Audio Data ... m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but . m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything . m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% . m054_3_0578_AAH_00: <hm> m054_5_0579_MTY_00: right . <B> m054_4_0580_ZMW_00: so , <B> this... Transcriptionprocess TranscriptionConventions Transcription Tool TRL FilesMAR FilesVoc Lists
; CDR: 00.00 ; TRV: 00.00 ; File: e025at ; Last changes made on 09/29/2000 ; Transcriber: VLM ; Comments: ; e025_1_0000_ITL_00: hello ? <P> can you hear me now ? e025_2_0001_XYZABC_00: hello . e025_1_0002_ITL_00: hello% . yeah% . e025_2_0003_ XYZABC _00: <uh> yes , I can . e025_1_0004_ITL_00: yes , okay . <P> so ? e025_2_0005_ XYZABC _00: -/hi I would like/- <P> yes ? e025_1_0006_ITL_00: yes , can you hear me now ? e025_2_0007_ XYZABC _00: <uh> yes , I can . e025_1_0008_ITL_00: okay . <B> wonderful . <Laugh> <B> <P> <Smack> <B> so , can I help you ? <B> e025_2_0009_ XYZABC _00: -/all right I would like/- <uh> yes , madam . I would like to schedule a winter vacation <P> in the north of Italy . e025_1_0010_ITL_00: <hm> <B> e025_1_0011_ITL_00: yes . <B> would you like t= <*T>t e025_1_0012_ITL_00: yes . would you like to come here% in summer or during winter ? e025_2_0013_ XYZABC _00: <uh> in winter please .
first pass transcription (but not rough ..) close check and correction by another transcriber marker file and trl file cross-check spell-checking Data transcription process automatic convention check
Audio Data ... m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but . m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything . m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% . m054_3_0578_AAH_00: <hm> m054_5_0579_MTY_00: right . <B> m054_4_0580_ZMW_00: so , <B> this... Transcriptionprocess TranscriptionConventions Transcription Tool TRL FilesMAR FilesVoc Lists
Doctors Medical scenarios development Analysis of medical databases Definition of some scripts Pre-tests Scenarios Data collection