490 likes | 579 Views
Multilingual HLT in Europe and the development of ASR. Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands. PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote. Some history.
E N D
Multilingual HLT in Europe and the development of ASR Louis C.W. Pols Institute of Phonetic Sciences University of Amsterdam The Netherlands PRASA2001 – Franschhoek, South Africa 30 Nov. 2001, keynote
Some history • Liesbeth Botha spent half a year at our institute during second half of 1996 • ever since the possible organization of a workshop or a major conference in South Africa was considered • (cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch • PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project • I always wanted to visit South Africa! PRASA2001 - Franschhoek
Overview • Multilingual Europe (vs. Multilingual South Africa) • EU Framework Programs; Human Language Technology (HLT) • Other (European) programs and organizations • ISCA • Dutch speech database initiatives (vs. AST) • Speech science and technology; ASR development • Academia (knowledge) and industry (applications) • Conclusions PRASA2001 - Franschhoek
Multilingual Europe • Europe (West, Central, East) EU-countries Candidate-EU-countries Schengen countries (internally no boundary control) Euro countries (300 M people) • many nations and even more languages • multilingual community and (open) market • e-commerce, telebanking, infokiosk, etc. PRASA2001 - Franschhoek
EU Framework Program FP5 • Human Language Technologies RTD (HLT) http://www.hltcentral.org/ • part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools) • part of fifth Framework Program ’98-’02 (FP5) • IST 3600 M€ (26.5% of FP5); HLT 125 M€ • HLT: Multilingual communication Natural Interactivity Cross-lingual information management Support & Accompanying Measures PRASA2001 - Franschhoek
6th Framework program • FP6 (’02-’06) the way forward • proposal published Febr. 2001 • one of 7 priority themes: Information Society Technologies • also networks of excellence • IST budget 3600 M€ PRASA2001 - Franschhoek
Complaints from academia • too much application & user oriented • little room for research (reaction Commission: it is time for HLT to show its usefulness!), but .... pendulum swings! • speech data not freely available (only with delay and at (high) costs via ELRA) • still: several very interesting projects • we participated before (SAM, EuroCocosda, somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do PRASA2001 - Franschhoek
Some HLT ‘speech’ projects • C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo) • CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo) • I-EYE Interacting with Eyes: Gaze Assisted Access to Information in Multiple Languages (1/00, 30 mo) • NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo) • SIRIDUS Specification, Interaction and Reconfiguration In Dialogue Understanding Systems (1/00, 36 mo) • SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo) (finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon) • SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo) PRASA2001 - Franschhoek
Some ‘past’ HLT projects • ARISEAutomatic Railway Systems for Europe (10/96, 24 mo) • CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo) • EAGLES Expert Advisory Group on Language Engineering Standards (11/97, 24 mo) • ELRAEuropean Language Resources Association (9/95, 50 mo) • ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo) • SPEECHDATSpeech Databases for Creation of Voice Driven Teleservices (3/96, 34 mo) • SPEECHDAT-CAR(3/98, 30 mo) + variants • VODISAdvanced Speech Technologies for Voice-operated Driver Information Systems (11/95, 43 mo) PRASA2001 - Franschhoek
some HLT ‘support’ projects • CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001) • ELSNET-HLT The European Network of Excellence in Human Language Technologies • HOPE HLT Opportunity Promotion in Europe, Euromap • ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX PRASA2001 - Franschhoek
eContent • eContent part of eEurope initiative • European Digital Content on the Global Networks, ’01-’05, 100 M€, 1st call 3/2001 • Action Line 2 (AL2) addresses the intersection of the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment • http://www.hltcentral.org/econtent/ PRASA2001 - Franschhoek
MLIS • Multilingual Information Society Program • Supporting the creation of a framework of services for European language resources • Encouraging the use of language technologies, resources and standards • Promoting the use of advanced language tools in the Community and Member States public sector • one call in June ’99, 15 M€, some 30 proj. • f.i. NL-TRANSLEX: Machine Translation for Dutch and English/French/German PRASA2001 - Franschhoek
INTAS • International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS) • established June 1993 • Open + Thematic Call 2000 (budget 16 M €) • max budget 150 k€/project (max 30 k€/NIS partner) • INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo) PRASA2001 - Franschhoek
Euromap • HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points) to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce). • General: http://www.hltcentral.org/euromap/ • Dutch site: http://www.taalunieversum.org/tst/en/ PRASA2001 - Franschhoek
European Language Resources Association • A non-profit organization to promote the creation, verification, and distribution of language resources. • US counterpart: LDC • 173 resources sold in 2000. • organizer of LREC conferences (third one in May 2002 in Las Palmas, Spain) • speech & related resources ~200 • written resources ~145 • terminological resources • tools and software • http://www.icp.grenet.fr/ELRA/home.html PRASA2001 - Franschhoek
ELSNET • European Network of Excellence in Human Language Technologies • one of the ~20 networks within FP5 • Transfer of knowledge and expertise; Shared goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization • yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’ • Newsletter Elsnews; http://www.elsnet.org PRASA2001 - Franschhoek
Justus Roux COCOSDA • Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation • yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda • topic domains • Evaluation of Speech Underst. and Dialogue Systems (W. Minker) • Multi-modal corpora (S. Nakamura) • Corpus Annotation Tools (S. Bird) • Local Languages (D. Gibbon) • regional programs (Europe; Asia; Oceania; Africa; Latin America) • data center representatives (LDC, S. Bird; ELRA, K. Choukri) • http://www.itl.atr.co.jp/cocosda PRASA2001 - Franschhoek
COCOSDA matrix PRASA2001 - Franschhoek
COST • European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only): • COST 249: Continuous Speech Recognition over the Telephone (19 countries; start 5/94; 6 yrs; final report) • COST 250: Speaker Recognition in Telephony • COST 258: The Naturalness of Synthetic Speech • COST 277: Nonlinear Speech Processing • COST 278: Spoken Language Interaction in Telecommun. • http://cost.cordis.lu/src/home.cfm PRASA2001 - Franschhoek
EURESCOM • the European Institute for Research and Strategic Studies in Telecommunications • 20 shareholders from 19 European countries (major European network operators and service providers) • f.i. MUST - MUltimodal, multilingual information Services with small mobile Terminals (P1104) PRASA2001 - Franschhoek
ISCA • European Speech Comm. Association founded in ’88 • from ESCA to ISCA at Eurospeech’99 in Budapest • membership organization • organizer of Eurospeech/ICSLP - Interspeech • organizer of specialized workshops (ITRWs) • Special interest groups (SIGs) • Speech Communication Journal (http://www.elsevier.com/locate/specom) • http://www.isca-speech.org/ PRASA2001 - Franschhoek
Eurospeech-ICSLP-Interspeech odd years (Eurospeech) even years (ICSLP) (in Europe) (elsewhere) 1 Paris ’89 Kobe ’90 2 Genoa ’91 Banff ’92 3 Berlin ’93 Yokohama ’94 4 Madrid ’95 Philadelphia ’96 5 Rhodes ’97 Sydney ’98 6 Budapest ’99 Beijing ’00 7 Aalborg ’01 Denver ’02 8 Geneva ’03 Seoul ’04 9 Lisbon ’05 ?? ’06 past future PRASA2001 - Franschhoek
ISCA SIGs • Speech Synthesis - SynSig • Audio Visual Speech - AVISA • Speech And Language Technology for MInority Languages - SALTMIL • Integration of Speech Technology in (Language) Learning - InSTIL • SPeaker and Language Characterization - SPLC • Education in the Field of Speech Communication - EduSIG • Speech Prosody - SProSIG • Dialogue Processing - SigDial(also within ACL) • Groupe Francophone de la Communication Parlée - GFCP PRASA2001 - Franschhoek
ISCA ITRWs (forthcoming) • Prosody in Speech Recognition and Understanding - Prosody 2001Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001 • TIPS - Temporal Integration in the Perception of SpeechAix-en-Provence, France, 8-10 April 2002 • Multi-Modal Dialogue in Mobile EnvironmentsKloster Irsee, Germany, June 17-21, 2002 • Advanced ASR for Telecom ApplicationsPalais des Papes, Avignon, France, November 27-29, 2002 Supported but not organized by ISCA: • 2001 International Workshop on Automatic Sp. Recogn. and Underst.Madonna di Campiglio (Trento), Italy, December 9-13, 2001 • Speech Prosody 2002Aix-en-Provence, France, 11-13 April, 2002 PRASA2001 - Franschhoek
IEEE • IEEE Signal Processing Society MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001 ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001 2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002 • IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks • http://www.ieee.org/ PRASA2001 - Franschhoek
DARPA NIST • DARPA Projects and Yearly evaluations • CSR (Continuous Speech Recognition); • LVCSR (Large Vocabulary Conversational Speech Recognition); • ATIS (Air Travel Information System); • Language Recognition (Identification and Verification); • Speaker Recognition (Identification and Verification) PRASA2001 - Franschhoek
NATO-ASI • ASI = Advanced Study Institute • many different domains • certain restrictions on NATO vs. non-NATO participants, free registration, some funding • Dynamics of Speech Production and Perception, Il Ciocci, Italy, June 23 – July 6, 2002 • send application before Jan. 15, 2002 to asi2001@ebire.org • Organizing Cee.: Pierre L. Divenyi & Klára Vicsi PRASA2001 - Franschhoek
European national programs • German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS) • Spoken Dutch Corpus • French AUP • Swedish Centre for Speech Technology (CTT) Swedish National Graduate School in Language Technology (GSLT) PRASA2001 - Franschhoek
Dutch speech database initiatives • Speech Processing Expertise Center SPEX • 5,000 speakers Polyphone • 1,000 speakers SpeechDat + variants • NWO Priority program TST-OVIS (public transportation information system over telephone) • 1,000 hrs CGN (Dutch-Flemish) • 5.5 hrs ‘open source’ IFA-corpus • TST Platform • ToDI (Transcription of Dutch Intonation) PRASA2001 - Franschhoek
Spoken Dutch Corpus • 4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech • Corpus design and compilation • Recording and digitization • Orthographic transcription (all) • Lemmatization and POS tagging (all) • Lexicon link-up (all) • Broad phonetic transcription (1 M) • Word segmentation (1 M) • Syntactic annotation (1 M) • Prosodic annotation (250 k) • Development of exploitation software COREX • http://lands.let.kun.nl/cgn/home.htm PRASA2001 - Franschhoek
IFA corpus • 5.5 hrs of high-quality-recorded speech • 4 male and 4 female speakers • more than 30 min. per speaker • various speaking styles per speaker from conversational and read speech, to isolated sentences, words and syllables • everything phonemically segmented & labeled • free access via SQL query language • http://www.fon.hum.uva.nl/IFAcorpus PRASA2001 - Franschhoek
Speech science and speech technology • we should try to bridge that gap • see my keynotes at ICPhS ’99 and Eurospeech’01: “Flexible, robust and efficient human speech processing versus present-day speech technology” “Acquiring and implementing phonetic knowledge” • we have to understand each other in order to be able to communicate and to contribute • probabilistic vs. knowledge driven • adding (multiple) knowledge (sources) to improve performance • much knowledge in speech databases PRASA2001 - Franschhoek
Phonetics Speech Techn. PRASA2001 - Franschhoek
Do recognizers need intelligent ears? • intelligent ears front-end pre-processor • only if it improves performance • humans are generally better speech processors than machines, perhaps system developers can learn from human behavior • robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion) PRASA2001 - Franschhoek
What is (phonetic) knowledge? • phonetic textbook knowledge • probabilistic knowledge from databases • fixed set of features vs. adaptable set • trading relations, selectivity • knowledge of the world, expectation • global vs. detailed PRASA2001 - Franschhoek
How good ishuman/machine speech recogn.? PRASA2001 - Franschhoek
Human vs. machine (ASR) • machine surprisingly good for certain tasks • machine could be better for many others • robustness, outliers • what are the limits of human performance? • in noise • for degraded speech • missing information (trading) PRASA2001 - Franschhoek
humans start to have some trouble recognizers do have trouble! Human word intelligibility vs. noise
Robustness to degraded speech • speech = time-modulated signal in frequency bands • relatively insensitive to (spectral) distortions • prerequisite for digital hearing aid • modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz • temporal smearing of envelope modulation • ca. 4 Hz max. in modulation spectrum syllable • LP>4 Hz and HP<8 Hz little effect on intelligibility • spectral envelope smearing • for BW>1/3 oct masked SRT starts to degrade PRASA2001 - Franschhoek
Robustness to degraded speechand missing information • partly reversed speech (Saberi & Perrott, Nature, 4/99) • fixed duration segments time reversed or shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed original ) • low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum • syllable as information unit? (S. Greenberg) • gap and click restoration (Warren) • gating experiments PRASA2001 - Franschhoek
Desired pre-processor characteristics in ASR • basic sensitivity for stationary and dynamic sounds • robustness to degraded speech • rather insensitive to spectral and temporal smearing • robustness to noise and reverberation • filter characteristics • is BP, PLP, MFCC, RASTA, TRAPS good enough? • lateral inhibition (spectral sharpening); dynamics • what can be neglected? • non-linearities, limited dynamic range, active elements, co-modulation, secondary pitch, etc. PRASA2001 - Franschhoek
Caricature of present-day speech recognizers • fixed pre-processor, fixed features • trained with a variety of speech input • much global information, but ..... no interrelations • monaural, uni-modal input • pitch extractor generally not operational • performs well on average behavior • but ..... does poorly on any type of outlier (OOV, non-native, fast or whispered speech, other communication channel, new topic, new speaker) • neglects lots of useful (phonetic) information • heavily relies on language model PRASA2001 - Franschhoek
overall average=95 ms normal rate=95 primary stress=104 word final=136 utterance final=186 Useful information: durational variability Adopted from Wang (1998)
Academia (knowledge) and industry (applications) • what do industry and universities expect from each other? (panel discussion at E’01) • proper education and training E-masters • good exchange between academia & industry • participation in joint projects speech DB • adapt to requirements CAIP Symposium • open source approach Linux, praat, HTK • complaints: sometimes bad management and high risk (puts HLT in bad spotlight, e.g. L&H) PRASA2001 - Franschhoek
Information Technology for Homeland Security • Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29 • “subsequent to events of Sept. 11, CAIP modified its traditional Annual Research Review” • “Symposium identifies issues in Homeland Security and encourages research, particularly with university-industry cooperation” • e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking PRASA2001 - Franschhoek
E-masters inLanguage and Speech • Course Content: • Theoretical Linguistics • Natural Language Processing • Phonetics and Phonology • Cognitive models for speech language processing • Speech signal processing • Pattern recognition • Language engineering applications • http://www.cstr.ed.ac.uk/euromasters/ PRASA2001 - Franschhoek
Conclusions • collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications • combine industrial and academic skills • make proper use of experiences elsewhere • that’s why we are all here at this workshop! • good luck and thank you for your attention PRASA2001 - Franschhoek