1 / 13

Language Technology

Language Technology. Torbjørn Nordgård Dept of Language and Communication Studies, NTNU Torbjørn Svendsen Dept of Electronics and Telecommunications, NTNU Erik Harborg SINTEF Information and Communication Technology Knut Kvale Telenor Research and Development. What is Language Technology?.

stormy
Download Presentation

Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Technology Torbjørn Nordgård Dept of Language and Communication Studies, NTNU Torbjørn Svendsen Dept of Electronics and Telecommunications, NTNU Erik Harborg SINTEF Information and Communication Technology Knut Kvale Telenor Research and Development

  2. What is Language Technology? • Very broadly: “The branch of information technology which deals with natural language information” • Definition from the ”Norwegian Language Bank” report: ”Human language technology (HLT) involve simplifying and enhancing communication between people and facilitating man-machine interface. Such technologies make it easier to utilize modern information technology because they allow users to communicate in the mode they know best – their own oral and written language.” • ( NOT: “Branches of linguistics which make use of computers in research” )

  3. Language technology products today (some examples) • Spell checkers • Grammar checkers • Document classification (content based classification) • Document retrieval • Automatic switchboards • “Screen readers” (text to speech) • Natural language interface to information sources (e.g. traffic information) • Dialogue systems via telephone (e.g. automatic reservation) • Automatic dictation (e.g. in hospitals) • Machine translation (restricted areas, generally) • New enabling tools for disabled (speech technology, word prediction, etc) • …

  4. Important research areas • Speech technology • Automatic speech recognition • Text-to-speech synthesis • Speaker recognition • Language recognition • Signal processing • (Statistical) Pattern recognition • Hidden Markov models (HMM), artificial neural networks etc • Dialogue systems • Machine translation (rule-based, statistical, hybrid, …) • Text production (language generation from non-linguistic information sources) • Information retrieval with semantic models • Word sense disambiguation (find the correct meaning of an ambiguous word) • Language modeling • Statistical • Formal language modeling • Syntax • Semantics • Phonology • Discourse analysis • Text understanding • Parsing and generation algorithms • …

  5. Progress during the last 20 years? • Speech synthesis: • Improved naturalness and intelligibility by the move from rule-based to data-driven methods • Speech recognition: • Statistical modeling (HMM), more robust signal analysis, more training data and more powerful computers -> increased robustness and better performance • Dialogue systems: • Improved performance of individual system components, transition from system driven to mixed initiative dialogue strategy • Machine translation: • Slightly better quality; better formal devices, statistical approaches (novel methods) • Document analysis and retrieval: • Formal models, indexing algorithms, computer power, …

  6. Quality of key products today • Quality is in the eyes of the beholder • User perception of system quality the only true measure of quality • Lab tests of speech recognition: 99% accuracy • General dictation for ordinary users: approx 85% accuracy • Lab tests of machine translation: up to 95% perfect translations • Translation of arbitrary texts by ordinary users: less than 50%

  7. Long term projects in Norway • BRAGE (User interface with natural language) • FONEMA (Development of a realistic Norwegian computer voice as the main goal) • LOGON (Lexicon, word semantics, grammar and translation for Norwegian) • VOCALS (”… communication systems, advanced dialogue management and spoken language technology”) • MOBEL (”Mobile Patient Records”, including ASR and text generation components ) • …

  8. Prospects for research in language technology in Norway • NTNU has all relevant disciplines • Informatics • Linguistics • Phonetics • Signal processing • Mathematics and statistics • Broad experience with multi-disciplinary cooperation • Many years of cooperation with external partners • Experience from commercialization of LT • LT embedded in current curricula • Linguistics, signal processing, informatics… • Education program (bachelor) with all relevant LT components • Master program in LT under development

  9. Two products from NTNU: • TUC (”The Understanding Computer”) • Developed by Prof. Tore Amble • Installed as a question/answering system at Team Trafikk in Trondheim (the “Bussorakel” at http://www.team-trafikk.no/ ) • Internet access (English and Norwegian) • SMS access (English and Norwegian) • Spoken dialog (under development) • Very satisfied customer … • LingDys (”Linguistically based writing tool for dyslectics”) • Developed by Prof. Torbjørn Nordgård (and Prof. Ragnar Thygesen (Stavanger), Lars Johnsen (Bergen) ) • Specially designed spell checker • Speech synthesis included (developed by Telenor F&U) • Direct dictionary • Dialect adaptation • Word prediction • Satisfied customers …

  10. The future (?) • Speech technology: • Automatic transcription of colloquial speech • Automated search/indexing in multimedia and audio archives • Synthetic speech capable of conveying emotions and “identity” • Document creation • Intelligent and proactive authoring tools • Multilingual information management (retrieval, indexing, …) • Better machine translation • More language pairs • Better translations on the average • Speech to speech machine translation (in restricted task domains) • Interaction with various “devices” via natural language (“speaking refrigerators”(?), talk with the computer, talk with your car …)

  11. Prerequisites • Language resources • Speech corpora • Text corpora • Dictionaries • Best availability for English • Few relevant resources for Norwegian, Danish, Swedish … • … high quality language technology for many languages, but not Norwegian … ? • Cultural issue • … more and more language technology in standard operating systems office suites, like MS Word. Some of the most interesting tools are not available for Norwegian … (and will not be in the near future)

  12. A Norwegian language technology industry? • LT presupposes knowledge from many areas, including linguistics, phonetics etc • Existing Norwegian companies are not of the relevant type • However – Norwegians need to “adapt to English”. • Solutions will be developed within existing companies • Opportunities for small companies (like LingIT …) • Size in 15 years?? • The impact of Norwegian LT will be much higher than number of jobs in LT industry • Productivity increase, in public and private sector

  13. Conclusions • Existing products will be better in the years to come • Some new products will emerge • Natural language interfaces will become more integrated in ICT • Text • Speech • Norwegian language in this picture?

More Related