10 likes | 172 Views
RECURRENT NEURAL NETWORK LANGUAGE MODELING FOR CODE-SWITCHING CONVERSATIONAL SPEECH Heike Adel, Ngoc Thang Vu Franziska Kraus, Tim Schlippe, Haizhou Li, Tanja Schultz heike.adel@student.kit.edu, thang.vu@kit.edu. 1. Overview Goal
E N D
RECURRENT NEURAL NETWORK LANGUAGE MODELING FOR CODE-SWITCHING CONVERSATIONAL SPEECH Heike Adel, Ngoc Thang Vu Franziska Kraus, Tim Schlippe, Haizhou Li, Tanja Schultz heike.adel@student.kit.edu, thang.vu@kit.edu • 1. Overview • Goal • Predictionof Code Switches based on textualfeatures (wordsand POS tags) • Extended structureofrecurrentneuralnetworksfor Code-Switching • => 10.8 % (2 %) relative improvement in termsofperplexity (WER) on the SEAME developmentsetand 16.9 % (2.7 %) relative on theevaluationset • Whatis Code-Switching (CS)? • Code-Switchingspeechisdefinedasspeechthatcontainsmorethanonelanguage. Itis a commonphenomenon in multilingual communities. • 2.2 Code-Switching-Analysesofthe Corpus • Predictionof Code-Switches • Trigger words: • POS Taggingfor CS-Speech Post-processing Language islands (> 2 embeddedwords) POS taggerfor English Output „Matrix language“= Mandarin „Embedded language“ = English English segments in remainingtext CS-text POS taggerfor Mandarin Remainingtext Output Effort (# rules) and quality using cross-lingual rules Mandarin triggerwords • Trigger POS: • 2.1 The SEAME Corpus [D.C. Lyu et al., 2011] • SEAME = South East Asia Mandarin-English • Conversational Mandarin-English Code-Switch speechcorpus • Temporarilyprovidedaspartof a jointresearchprojectby NTU and KIT • About 63 hoursofaudiodataandtheirtranscriptions • Fourlanguagecategories: English, Mandarin, particles (SingapoureanandMalaysiandiscourseparticles) andothers (otherlanguages) • Averagenumberof CS per utterance: 2.6; veryshortmonolingualsegments • => challenging bilingual task Mandarin trigger POS English triggerwords English trigger POS • 3. RecurrentNeural Network Language Model (RNNLM) for Code-Switching • 4. Experiments andResults • Perplexity Evaluation andRescoring Experiments • Rescoringof 100-best listsofour CS-ASR system [Vu, 2012] with different settingsforlanguage model weights (lz) andwordinsertionpenalties (lp): • RNNLM andthe 3-gram LM ofthe ASR systemareweightedequally (λ = 0.5) • Performance Measure: Mixed Error Rate (MER): worderrorratesfor English segmentsandcharactererrorratesfor Mandarin segments English trigger POS • Input: • Word vector w(t) • Feature vector f(t) containing POS tags • Hidden Layer: Vector s(t) containingthestateofthenetwork • Output: • Vector c(t) withtheprobabilitiesforeachlanguage • Vector y(t) withprobabilitiesforeachwordgivenitslanguage • U1, U2, V, W: weightsfortheconnectionsbetweenthelayers • Training with back-propagationthrough time (BPTT) • Computationoftheprobabilities: • Reference to CS task: usewordsandfeaturesto not only • determinethenextword but also thenextlanguage (OF: outputfactorization, FI: featureintegration) ICASSP 2013 – The 38th International Conference on Acoustics, Speech, and Signal Processing