150 likes | 237 Views
SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale. Carlo Aliprandi Synthema srl. Company Profile.
E N D
SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale Carlo Aliprandi Synthema srl Carlo Aliprandi
Company Profile Based in Pisa (Italy), SyNTHEMA is a high-technology SME that was established in 1993 by computer scientists from the IBM Research Center. Since then, the company has rapidly evolved, becoming nowadays a leading provider of Language and Semantic solutions, with state-of-the-art technologies for applications like Enterprise Search, Audio&Text Mining, Technology Watch, Competitive Intelligence, Speech Recognition, Respeaking and Speech Analytics. Grounding its leadership into a strong IT Research and Development, SyNTHEMA has pioneered a number of innovative applications and solutions, adopted on a daily basis by a vast amount of users to perform productivity tasks in different markets and industries, including Homeland Security, Intelligence and Law Enforcement, Public Administration and Government, Healthcare and Media. Carlo Aliprandi
Structure and activities Semantic Technology Translation Technology Speech Technology • 30 People (20 IT, 10 Localisation Services) Carlo Aliprandi
Il linguaggio naturale Source Ethnologue Source Netz-Tipp.De Source http://www.netz-tipp.de/languages.html Carlo Aliprandi
Tecnologie del linguaggio, alcuni esempi LINGUAGGIO SCRITTO • Traduzione Automatica • Semantica • Ricerca in linguaggio naturale • Information Retrieval • QuestionAnswering LINGUAGGIO PARLATO • SpeechRecognition – Speechto Text • Respeaking • Trascrizione Automatica • Sottotitolazione Assistita • Comprensione del Parlato • Gestione del dialogo (Avatar,..) Carlo Aliprandi
Semantica The Italian market offers State of the art for: • Lemmatisation • POS Tagging • MultiWord Detection (MWD) • NamedEntityRecogniiton (NER) • Parsing (dependency – constituency) • Word SenseDisambiguation (WSD) • SentimentAnalysis (SA) • SemanticRoleLabeling (SLR) Languages: Carlo Aliprandi
Semantica • è un cool topic? • Bing Microsoft – Powerset (linguistic processor) • Google – Applied Semantics (ontology, or knowledge base of concepts and their relationships, coupled with linguistic processing engine) • Google Squared (structures the unstructured data on web pages) • Hakia (meaning-based search engine, ontology and semantic lexicon, ontological parser) • WolphramAlpha + computational knowledge engine, distilled and revised knowledge, NL query, rich visualisation • Knowledge engineering, language dependent • IBM Watson (Jeopardy!) • aspettando la killer app, c’è una domanda latente di “Semantic Search” Carlo Aliprandi
Speech Technology The Italian market offers State of the art for: • Automatic Speech Recognition • Automatic Transcription • Dialogue Systems • Speech Analytics Languages: Carlo Aliprandi
The evolution of Dictation • 1° generation: 1990-2000, Application of ASR products to respeaking • Players (technology for CSR): • IBM ViaVoice, Dragon DNS, L&H Xspeech, Philips FreeSpeech, Kurtzweil, Nuance, Loquendo and others (>10!!) tools plugged into existing subtitling solutions • Technology Benefits: • Speaker dependent, great accuracy and large accent coverage • Large Vocabularies available (LVSR) • Good accuracy up to 95-97% • Good throughput (up to 170 wpm) • Some technology limitations: • SR mainly designed for dictation • SR available for ‘general’ domains / main languages • Partial coverage of specific domains (news, politics, economy, gossip…) • Problem to deal with Out-of-Vocabulary-Words • Error correction (live and deferred) • Improvement of language models • But main benefit: • technology can allow fast training of new (untrained) staff • technology affordable and costless, no need for huge investments • Well fitting to pre-recorder and close-to-live programs • And main operating limitations: • Typically support single operator (Respeaker) • The respeaker ‘alone’ has to face a challenging task, with a big cognitive overload • Hardly fitting to Live programs (talk-shows, interviews…) Carlo Aliprandi
The evolution of Dictation • 2° generation 2000-2010 : • Global Players: Nuance DNS, Philips Speechmagic, IBM ViaVoice • Technology Benefits: • Speaked dependent, great accuracy and large accent coverage • Large Vocabularies available (LVSR) • Good accuracy up to 97-99% • Good throughput (up to 170 wpm) • Overcomed technology limitations: • SR mainly designed for dictation -> Adaptation to different speech (conversational speech) -> Reduced training time (30’ - > 5’) • SR available for ‘general’ domains -> development of specific topics (news, politics, … • Problem to deal with OVW -> preanalysis of similar text/scripts -> live management (editing+insertion) of OVW • Error correction (live and deferred) -> live: dual operator systems (respeaker+corrector) • Improvement of language models -> respeaked speech and aligned scripts saved: error correction improving language models (lettuce - let’us) • benefits: • Fitting to ‘major’ Live Programs (News, sport events) • And main operating limitations: • The respeaker has still to face a cognitive overload • Not completely fitting to specific kind of Live programs (chat magazines, talk-shows, major political debates.. • Introducing subtitles with some delay (5-7’ acceptable) Carlo Aliprandi
The present (and future) of Dictation • 3° generation: 2010-2015 • Global Player technology for CSR: • Nuance DNS (and no others !!). • emerging of providers of new professional technology for SR: • Emerging of new ASR engines for (batch and live) transcription • Speaker Independent systems (Nuance Dictate, IBM Attila….) • SR engines for Smartphones and cloud services (Google Speech, Apple, Facebook, …) • new emerging interest and applications • Audio Alignement and segmentatoin • Audio annotation and indexing for cross-media search • Media Monitoring Carlo Aliprandi
ASR from an Industry perspective Needs? ASR has several limitations, because it has been designed for dictation applications, thus performing too poorly in specific tasks, like Subtitling. language coverage may be limited, as commercial systems have been developed to target the main language markets (i.e. English, Spanish, French, German, ..) and they are not available for many languages and dialects domain coverage may be limited, as commercial systems have been developed to target general and generic topics Limitations Data: resources (raw data – tagged data – models) to build an ASR technology are not available for several languages Needs are different, from the market perspective Carlo Aliprandi
SAVAS • Is ASR god enough for an application task like Subtitling? • Is an IT provider (academy or R&D) sufficient to fullfill market needs (improving operations, new offerings ..)? • Reporting is different (vs Respeaking) : • Not real time • Typically Verbatim (or close-to) • Different audience • No persistence and visualization boundaries (colors, formatting, audio descriptors….) • Dictation has proved to be a valid alternative for subtitling, taking over traditional reporting methods • Traditional reporting methods, like fast keyboarding and stenotyping early adopted • SAVAS brings together Broadcasters, Subtitling Companies, Universities and Companies involved in the industries of Media, Accessibility and LVCSR Carlo Aliprandi
Speech Recognition • Dictation • Dictation is the interactive composition of text • Medical Report, court – parliamentary proceedings • Transcription • Transcription is transforming speech into text (Batch – Online) • Dialogue • CRM, device control, navigation, call routing • Multimedia Mining • Audio2text ; Text2Audio Carlo Aliprandi
Thank you • Q&A • Courtesy of Carlo Aliprandi