1 / 49

Multilingual HLT in Europe and the development of ASR

Multilingual HLT in Europe and the development of ASR. Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands. PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote. Some history.

Download Presentation

Multilingual HLT in Europe and the development of ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilingual HLT in Europe and the development of ASR Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote

  2. Some history • Liesbeth Botha spent half a year at our institute during second half of 1996 • ever since the possible organization of a workshop or a major conference in South Africa was considered • (cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch • PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project • I always wanted to visit South Africa! PRASA2001 - Franschhoek

  3. Overview • Multilingual Europe (vs. Multilingual South Africa) • EU Framework Programs; Human Language Technology (HLT) • Other (European) programs and organizations • ISCA • Dutch speech database initiatives (vs. AST) • Speech science and technology; ASR development • Academia (knowledge) and industry (applications) • Conclusions PRASA2001 - Franschhoek

  4. Multilingual Europe • Europe (West, Central, East) EU-countries Candidate-EU-countries Schengen countries (internally no boundary control) Euro countries (300 M people) • many nations and even more languages • multilingual community and (open) market • e-commerce, telebanking, infokiosk, etc. PRASA2001 - Franschhoek

  5. EU Framework Program FP5 • Human Language Technologies RTD (HLT) http://www.hltcentral.org/ • part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools) • part of fifth Framework Program ’98-’02 (FP5) • IST 3600 M€ (26.5% of FP5); HLT 125 M€ • HLT: Multilingual communication Natural Interactivity Cross-lingual information management Support & Accompanying Measures PRASA2001 - Franschhoek

  6. 6th Framework program • FP6 (’02-’06) the way forward • proposal published Febr. 2001 • one of 7 priority themes: Information Society Technologies • also networks of excellence • IST budget 3600 M€ PRASA2001 - Franschhoek

  7. Complaints from academia • too much application & user oriented • little room for research (reaction Commission: it is time for HLT to show its usefulness!), but .... pendulum swings! • speech data not freely available (only with delay and at (high) costs via ELRA) • still: several very interesting projects • we participated before (SAM, EuroCocosda, somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do PRASA2001 - Franschhoek

  8. Some HLT ‘speech’ projects • C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo) • CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo) • I-EYE Interacting with Eyes: Gaze Assisted Access to Information in Multiple Languages (1/00, 30 mo) • NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo) • SIRIDUS Specification, Interaction and Reconfiguration In Dialogue Understanding Systems (1/00, 36 mo) • SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo) (finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon) • SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo) PRASA2001 - Franschhoek

  9. Some ‘past’ HLT projects • ARISEAutomatic Railway Systems for Europe (10/96, 24 mo) • CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo) • EAGLES Expert Advisory Group on Language Engineering Standards (11/97, 24 mo) • ELRAEuropean Language Resources Association (9/95, 50 mo) • ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo) • SPEECHDATSpeech Databases for Creation of Voice Driven Teleservices (3/96, 34 mo) • SPEECHDAT-CAR(3/98, 30 mo) + variants • VODISAdvanced Speech Technologies for Voice-operated Driver Information Systems (11/95, 43 mo) PRASA2001 - Franschhoek

  10. some HLT ‘support’ projects • CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001) • ELSNET-HLT The European Network of Excellence in Human Language Technologies • HOPE HLT Opportunity Promotion in Europe, Euromap • ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX PRASA2001 - Franschhoek

  11. eContent • eContent part of eEurope initiative • European Digital Content on the Global Networks, ’01-’05, 100 M€, 1st call 3/2001 • Action Line 2 (AL2) addresses the intersection of the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment • http://www.hltcentral.org/econtent/ PRASA2001 - Franschhoek

  12. MLIS • Multilingual Information Society Program • Supporting the creation of a framework of services for European language resources • Encouraging the use of language technologies, resources and standards • Promoting the use of advanced language tools in the Community and Member States public sector • one call in June ’99, 15 M€, some 30 proj. • f.i. NL-TRANSLEX: Machine Translation for Dutch and English/French/German PRASA2001 - Franschhoek

  13. INTAS • International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS) • established June 1993 • Open + Thematic Call 2000 (budget 16 M €) • max budget 150 k€/project (max 30 k€/NIS partner) • INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo) PRASA2001 - Franschhoek

  14. Euromap • HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points) to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce). • General: http://www.hltcentral.org/euromap/ • Dutch site: http://www.taalunieversum.org/tst/en/ PRASA2001 - Franschhoek

  15. European Language Resources Association • A non-profit organization to promote the creation, verification, and distribution of language resources. • US counterpart: LDC • 173 resources sold in 2000. • organizer of LREC conferences (third one in May 2002 in Las Palmas, Spain) • speech & related resources ~200 • written resources ~145 • terminological resources • tools and software • http://www.icp.grenet.fr/ELRA/home.html PRASA2001 - Franschhoek

  16. ELSNET • European Network of Excellence in Human Language Technologies • one of the ~20 networks within FP5 • Transfer of knowledge and expertise; Shared goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization • yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’ • Newsletter Elsnews; http://www.elsnet.org PRASA2001 - Franschhoek

  17. Justus Roux COCOSDA • Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation • yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda • topic domains • Evaluation of Speech Underst. and Dialogue Systems (W. Minker) • Multi-modal corpora (S. Nakamura) • Corpus Annotation Tools (S. Bird) • Local Languages (D. Gibbon) • regional programs (Europe; Asia; Oceania; Africa; Latin America) • data center representatives (LDC, S. Bird; ELRA, K. Choukri) • http://www.itl.atr.co.jp/cocosda PRASA2001 - Franschhoek

  18. COCOSDA matrix PRASA2001 - Franschhoek

  19. COST • European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only): • COST 249: Continuous Speech Recognition over the Telephone (19 countries; start 5/94; 6 yrs; final report) • COST 250: Speaker Recognition in Telephony • COST 258: The Naturalness of Synthetic Speech • COST 277: Nonlinear Speech Processing • COST 278: Spoken Language Interaction in Telecommun. • http://cost.cordis.lu/src/home.cfm PRASA2001 - Franschhoek

  20. EURESCOM • the European Institute for Research and Strategic Studies in Telecommunications • 20 shareholders from 19 European countries (major European network operators and service providers) • f.i. MUST - MUltimodal, multilingual information Services with small mobile Terminals (P1104) PRASA2001 - Franschhoek

  21. ISCA • European Speech Comm. Association founded in ’88 • from ESCA to ISCA at Eurospeech’99 in Budapest • membership organization • organizer of Eurospeech/ICSLP - Interspeech • organizer of specialized workshops (ITRWs) • Special interest groups (SIGs) • Speech Communication Journal (http://www.elsevier.com/locate/specom) • http://www.isca-speech.org/ PRASA2001 - Franschhoek

  22. Eurospeech-ICSLP-Interspeech odd years (Eurospeech) even years (ICSLP) (in Europe) (elsewhere) 1 Paris ’89 Kobe ’90 2 Genoa ’91 Banff ’92 3 Berlin ’93 Yokohama ’94 4 Madrid ’95 Philadelphia ’96 5 Rhodes ’97 Sydney ’98 6 Budapest ’99 Beijing ’00 7 Aalborg ’01 Denver ’02 8 Geneva ’03 Seoul ’04 9 Lisbon ’05 ?? ’06 past future PRASA2001 - Franschhoek

  23. ISCA SIGs • Speech Synthesis - SynSig • Audio Visual Speech - AVISA • Speech And Language Technology for MInority Languages - SALTMIL • Integration of Speech Technology in (Language) Learning - InSTIL • SPeaker and Language Characterization - SPLC • Education in the Field of Speech Communication - EduSIG • Speech Prosody - SProSIG • Dialogue Processing - SigDial(also within ACL) • Groupe Francophone de la Communication Parlée - GFCP PRASA2001 - Franschhoek

  24. ISCA ITRWs (forthcoming) • Prosody in Speech Recognition and Understanding - Prosody 2001Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001 • TIPS - Temporal Integration in the Perception of SpeechAix-en-Provence, France, 8-10 April 2002 • Multi-Modal Dialogue in Mobile EnvironmentsKloster Irsee, Germany, June 17-21, 2002 • Advanced ASR for Telecom ApplicationsPalais des Papes, Avignon, France, November 27-29, 2002 Supported but not organized by ISCA: • 2001 International Workshop on Automatic Sp. Recogn. and Underst.Madonna di Campiglio (Trento), Italy, December 9-13, 2001 • Speech Prosody 2002Aix-en-Provence, France, 11-13 April, 2002 PRASA2001 - Franschhoek

  25. IEEE • IEEE Signal Processing Society MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001 ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001 2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002 • IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks • http://www.ieee.org/ PRASA2001 - Franschhoek

  26. DARPA NIST • DARPA Projects and Yearly evaluations • CSR (Continuous Speech Recognition); • LVCSR (Large Vocabulary Conversational Speech Recognition); • ATIS (Air Travel Information System); • Language Recognition (Identification and Verification); • Speaker Recognition (Identification and Verification) PRASA2001 - Franschhoek

  27. NATO-ASI • ASI = Advanced Study Institute • many different domains • certain restrictions on NATO vs. non-NATO participants, free registration, some funding • Dynamics of Speech Production and Perception, Il Ciocci, Italy, June 23 – July 6, 2002 • send application before Jan. 15, 2002 to asi2001@ebire.org • Organizing Cee.: Pierre L. Divenyi & Klára Vicsi PRASA2001 - Franschhoek

  28. European national programs • German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS) • Spoken Dutch Corpus • French AUP • Swedish Centre for Speech Technology (CTT) Swedish National Graduate School in Language Technology (GSLT) PRASA2001 - Franschhoek

  29. Dutch speech database initiatives • Speech Processing Expertise Center SPEX • 5,000 speakers Polyphone • 1,000 speakers SpeechDat + variants • NWO Priority program TST-OVIS (public transportation information system over telephone) • 1,000 hrs CGN (Dutch-Flemish) • 5.5 hrs ‘open source’ IFA-corpus • TST Platform • ToDI (Transcription of Dutch Intonation) PRASA2001 - Franschhoek

  30. Spoken Dutch Corpus • 4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech • Corpus design and compilation • Recording and digitization • Orthographic transcription (all) • Lemmatization and POS tagging (all) • Lexicon link-up (all) • Broad phonetic transcription (1 M) • Word segmentation (1 M) • Syntactic annotation (1 M) • Prosodic annotation (250 k) • Development of exploitation software COREX • http://lands.let.kun.nl/cgn/home.htm PRASA2001 - Franschhoek

  31. IFA corpus • 5.5 hrs of high-quality-recorded speech • 4 male and 4 female speakers • more than 30 min. per speaker • various speaking styles per speaker from conversational and read speech, to isolated sentences, words and syllables • everything phonemically segmented & labeled • free access via SQL query language • http://www.fon.hum.uva.nl/IFAcorpus PRASA2001 - Franschhoek

  32. Speech science and speech technology • we should try to bridge that gap • see my keynotes at ICPhS ’99 and Eurospeech’01: “Flexible, robust and efficient human speech processing versus present-day speech technology” “Acquiring and implementing phonetic knowledge” • we have to understand each other in order to be able to communicate and to contribute • probabilistic vs. knowledge driven • adding (multiple) knowledge (sources) to improve performance • much knowledge in speech databases PRASA2001 - Franschhoek

  33. Phonetics  Speech Techn. PRASA2001 - Franschhoek

  34. Do recognizers need intelligent ears? • intelligent ears  front-end pre-processor • only if it improves performance • humans are generally better speech processors than machines, perhaps system developers can learn from human behavior • robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion) PRASA2001 - Franschhoek

  35. What is (phonetic) knowledge? • phonetic textbook knowledge • probabilistic knowledge from databases • fixed set of features vs. adaptable set • trading relations, selectivity • knowledge of the world, expectation • global vs. detailed PRASA2001 - Franschhoek

  36. How good ishuman/machine speech recogn.? PRASA2001 - Franschhoek

  37. Human vs. machine (ASR) • machine surprisingly good for certain tasks • machine could be better for many others • robustness, outliers • what are the limits of human performance? • in noise • for degraded speech • missing information (trading) PRASA2001 - Franschhoek

  38. humans start to have some trouble recognizers do have trouble! Human word intelligibility vs. noise

  39. Robustness to degraded speech • speech = time-modulated signal in frequency bands • relatively insensitive to (spectral) distortions • prerequisite for digital hearing aid • modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz • temporal smearing of envelope modulation • ca. 4 Hz max. in modulation spectrum  syllable • LP>4 Hz and HP<8 Hz little effect on intelligibility • spectral envelope smearing • for BW>1/3 oct masked SRT starts to degrade PRASA2001 - Franschhoek

  40. Robustness to degraded speechand missing information • partly reversed speech (Saberi & Perrott, Nature, 4/99) • fixed duration segments time reversed or shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed original ) • low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum • syllable as information unit? (S. Greenberg) • gap and click restoration (Warren) • gating experiments PRASA2001 - Franschhoek

  41. Desired pre-processor characteristics in ASR • basic sensitivity for stationary and dynamic sounds • robustness to degraded speech • rather insensitive to spectral and temporal smearing • robustness to noise and reverberation • filter characteristics • is BP, PLP, MFCC, RASTA, TRAPS good enough? • lateral inhibition (spectral sharpening); dynamics • what can be neglected? • non-linearities, limited dynamic range, active elements, co-modulation, secondary pitch, etc. PRASA2001 - Franschhoek

  42. Caricature of present-day speech recognizers • fixed pre-processor, fixed features • trained with a variety of speech input • much global information, but ..... no interrelations • monaural, uni-modal input • pitch extractor generally not operational • performs well on average behavior • but ..... does poorly on any type of outlier (OOV, non-native, fast or whispered speech, other communication channel, new topic, new speaker) • neglects lots of useful (phonetic) information • heavily relies on language model PRASA2001 - Franschhoek

  43. overall average=95 ms normal rate=95 primary stress=104 word final=136 utterance final=186 Useful information: durational variability Adopted from Wang (1998)

  44. Academia (knowledge) and industry (applications) • what do industry and universities expect from each other? (panel discussion at E’01) • proper education and training  E-masters • good exchange between academia & industry • participation in joint projects  speech DB • adapt to requirements  CAIP Symposium • open source approach  Linux, praat, HTK • complaints: sometimes bad management and high risk (puts HLT in bad spotlight, e.g. L&H) PRASA2001 - Franschhoek

  45. Information Technology for Homeland Security • Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29 • “subsequent to events of Sept. 11, CAIP modified its traditional Annual Research Review” • “Symposium identifies issues in Homeland Security and encourages research, particularly with university-industry cooperation” • e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking PRASA2001 - Franschhoek

  46. E-masters inLanguage and Speech • Course Content: • Theoretical Linguistics • Natural Language Processing • Phonetics and Phonology • Cognitive models for speech language processing • Speech signal processing • Pattern recognition • Language engineering applications • http://www.cstr.ed.ac.uk/euromasters/ PRASA2001 - Franschhoek

  47. Conclusions • collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications • combine industrial and academic skills • make proper use of experiences elsewhere • that’s why we are all here at this workshop! • good luck and thank you for your attention PRASA2001 - Franschhoek

More Related