640 likes | 950 Views
Natural Language Processing August 23, 2007. SpeechTEK University Deborah Dahl Conversational Technologies. Description of the Tutorial. An introduction to the principles of natural language processing and the role of natural language processing in current and future speech applications
E N D
Natural Language ProcessingAugust 23, 2007 SpeechTEK University Deborah Dahl Conversational Technologies
Description of the Tutorial An introduction to the principles of natural language processingand the role of natural language processing in current and future speech applications • 9:00-9:15 Introduction: what is natural language • 9:15-10:15 Part 1: Overview and Principles • 10:15-10:45 (30 minute break) • 10:45-12:00 Part 2: Detailed Examples
Attendees • Backgrounds and goals
Audience and Background • A general technical background. • No natural language processingbackground will be assumed, but experience developing speech applicationswould be helpful.
What is Natural Language? Natural language is the kind of language that’s used to communicate between people • Can be spoken, written or gestural (in the case of Sign Languages) • There are several thousand currently spoken human languages
Why are We Interested in Natural Language? • Support for more natural and effective computer-human interactions by accommodating the ways that people already communicate
Natural Language Processing • Natural language understanding • Natural language generation • Machine translation
Goals • Understand whatnatural language is • Learn about the most common techniques for processing naturallanguage • Their strengths and weaknesses • Understand where naturallanguage processing technology is headed in the future. • Focus is on commercial applications
Topics • What is natural language? • Issues in spokennatural language and how to handle them • Statistical Language Models(SLM's) • speech grammars with semantic tags • Variability in expression, pronouns, and filling multiple slots from asingle utterance • How emerging standards such as EMMAwill contribute to more sophisticated future applications • Recent topics in natural language research and how this research may eventually be utilized in future applications
Natural Language Understanding • The task of automatically assigning meaning to language
What natural language processing isn’t • Speech recognition, which turns the sounds of spoken language into the words of written language • Dialog management, which manages a natural language interaction between a user and a computer • Artificial intelligence, which studies how to provide intelligent capabilities to computers
Assigning Meaning to Language • In most applications, the developer decides what the set of possible meanings is • Meanings can be simple or complex • Language can be simple or complex • Current commercial techniques can • Assign simple meanings to simple language • Assign simple meanings to complex language • Research systems can handle more complex meanings and language, but no existing system can handle all meanings and all language for even one human language
Examples of Complex Language • Shakespeare • Religious texts • The United States Constitution • We don’t have to worry about assigning meaning to these texts!
Simple to Slightly More Complex Language • “yes” • “New York” • “call home” • “a red t-shirt, size large” • “I want to go from Philadelphia to New York on Sunday, August 19” • As language becomes more complex, the more we need special techniques to process it
Human Communication Process? language Thought Thought Person A Person B
More Realistic Communication Process Should I believe this? Could A be lying or lacking credibility? If I think A is lying should I say so? Did I hear it right? Did I understand it? Why did Person A say that? language A thought somewhat similar to Thought 1 Thought 1 Person A Person B How should I express this? Is this something I really need to say? What does B already know? Why do I want to express this thought? Do I want to impress B? Might I offend B by saying this? What language should I use?
Issues in Natural Language • Variability of expression • Infinite number of meanings that can be expressed • Infinite number of possible sentences in a language • Many ways to say the same thing • The same thing can have different meanings in different contexts
What is a Meaning? • Many approaches to representing meanings in traditional linguistics and philosophy of language • Most widely used commercial representation is as a token or as a set of slot/value pairs (also called “key/value” or “attribute/value” pairs) • Often structured into a set of related slot/value pairs (for example, the fields of a VoiceXML <form>, or a traditional frame)
Tokens • “my printer is printing horizontal bands and everything is printing in blue” “printer problem” • “I can’t connect to the internet” “internet problem”
What is a Meaning? Slot/Value Pairs • I want to go from Chicago to New York on August 19 midafternoon on United • Form/frame – airline reservation • Destination: New York • Departure city: Chicago • Departure date: August 19 • Departure time: midafternoon • Airline: United
Information Available for Extracting Meaning Used by today’s commercial systems • Words of the utterance • Word order • Grammatical endings • Specific grammar for the application • Information about what previous instances of that utterance have meant Used by research systems and people • Prosody (intonation, pauses, loudness, stress, timing) • General information about the language itself (dictionaries, grammars, thesauri) • Context of the utterance • Information about the topic • Facial expressions, gestures
Traditional Tasks in Natural Language Understanding • (Recognition – speech, handwriting, OCR…) • Lexical lookup • Part of speech tagging • Sense disambiguation • Syntactic parsing • Semantic analysis • Pragmatic analysis
Problems with Traditional Approaches • Try to describe the full language and a broad set of meanings • For practical applications, it’s much easier to just write a small grammar for a specific application
Natural Language Tasks in Commercial Speech Systems • (Recognition – speech, handwriting, OCR…) • Lexical lookup (part of recognition) • Part of speech tagging – parts of speech not used • Sense disambiguation – not needed, constrained application • Syntactic parsing – syntactic structure used indirectly • Semantic analysis • Pragmatic analysis } Done in parallel
Extracting Meaning in Commercial Applications • Filling slots by using semantically tagged grammars (CFG’s) • Mapping complex utterances to categories (SLM’s)
Semantically Tagged Grammars • A grammar defines what the recognizer can recognize (recognized strings) • Tags define return values for different recognized strings • Information used: words of the utterance and a special-purpose grammar
Context-Free Grammar Formats • Represent what a speech recognizer can recognize • Example: Request PoliteWord + Action + Item • (please open the door) • Speech Recognition Grammar Specification (SRGS) (ABNF and XML formats) • Java Speech Grammar Format (JSGF) • Nuance GSL • Microsoft Speech Application Programmer’s Interface (SAPI)
Semantic Tags • Reduce variability of expression • Assign return values to recognized strings • W3C Semantic Interpretation for Speech Recognition (SISR) • JSGF tags • SAPI tags • IBM ECMAScript tags • Nuance GSL
Capabilities of Tag Formats • Assign tokens to strings (JSGF) Yeah yes • Create key-value pairs (SAPI) • “to chicago” <destination>ord</destination> • Perform computations (SISR, IBM,GSL) • “three days from now” August 26, 2007 • “two medium and three large pizzas” 5 pizzas
SISR Tags for “yes” and “no” <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item><token>no way</token></item> </one-of> <tag>no</tag> </rule>
GSL Token DigitValue [ ([zero oh] one) { return (01) } ...] “oh one” 01
SISR Slot/Value "I would like a small coca cola and three large pizzas with pepperoni and mushrooms.” <rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule>
GSL Slot/Value ;GSL 2.0; ColoredObject:public (Color Object) Color [ [red pink] { <color red> } [yellow canary] { <color yellow> } [green khaki] { <color green> } ] Object [ [truck car] { <object vehicle> } [ball block] { <object toy> } [shirt blouse] { <object clothing> } ]
SAPI Slot-Value <RULE name="elvis"> <L PROPNAME="artist"> <P VALSTR="elvis_presley">elvis <O>presley</O></P> <P VALSTR="elvis_presley">the king</P> </L> </RULE>
Problems with Tagged Grammars • Hard to maintain when complex • Hard to anticipate all the variations in how someone might say something • Can use wildcards/garbage to ignore parts of utterance • Speech recognition suffers when grammars are too complex • Speech recognition suffers when wildcards are used
Statistical Language Models (SLM’s) • Speech recognition is based on statistical models, not grammars • In commercial systems, natural language processing is a process of classification, relatively coarse meaning extraction • Works well if goal is to extract very simple meanings
Stages in SLM Processing • Ngram speech recognition: probabilities of word sequences, usually 2-3 words • Much more flexible (but less accurate) than a grammar • However, accuracy is not as critical with SLM’s because you don’t have to get every single word right • Text classification: given a text, assign it to categories based on training from previous texts • There are many algorithms for classification
Problems with SLM’s • Less accurate than CFG’s • Expensive to implement and maintain • Require a lot of data for good performance
Tagged Grammars or SLM’s? • Deeply nested menus SLM’s • Complex applications with many slots to fill and precise meanings needed grammars • Can combine both approaches in one application • Front-end SLM followed by grammar • Prompt asks specific question to catch most common tasks but has “other” category
Other Combination Approaches • Use SLM technology to recognize but grammar to interpret • Rules combined with SLM’s • Robust parsing • Rules combined with wildcard • I want um make that a large pizza with pepperoni and onions
Emerging Standards: EMMA • EMMA (Extensible Multi-Modal Annotation) • Developed by the World Wide Web Consortium Multimodal Interaction Working Group • An XML format for representing users’ inputs and the results of processing them
How does EMMA relate to natural language understanding? • EMMA represents the results of a natural language understanding process
EMMA Benefits (1) • EMMA’s standard format lets all kinds of EMMA producers (multimodal modality components) exchange results • handwriting recognizers • speech recognizers • text classifiers • face recognizers • speaker identification and verification • …
Part of Speech tagging Speech recognition Lexical lookup Semantic analysis Parsing Ngram speech recognition Classification EMMA Benefits (2) • Through “<derived-from>”, provides a way for “specialist” processing components to cooperate in processing a single input
EMMA Example – (1) Annotation Elements from philadelphia to boston and i want a vegetarian meal <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:info> <application>airline</application> </emma:info> <emma:model> <model class="airline"> <source></source> <destination></destination> <days></days> <meals></meals> </model></emma:model> <emma:model>
EMMA Example – (2) Annotation Attributes <emma:interpretation id="interp5 emma:start="1186519245101" emma:mode="speech“ emma:end="1186519248391“ emma:confidence="0.03" emma:function="dialog" emma:duration="3290" emma:uninterpreted="false“ emma:lang="en-US" emma:verbal="true" emma:dialog-turn=“1" emma:tokens="from philadelphia to boston and i want a vegetarian meal " emma:medium="acoustic" emma:process="file://Microsoft Speech Recognizer 8.0 for Windows (English - US), SAPI5, Microsoft" > />
EMMA Example (3) Application Semantics <source>philadelphia </source> <destination>boston</destination> <meal>vegetarian</meal>
SAPI XML Grammar Examples • Windows Speech Recognition (Vista) • Office 2003 Speech Recognition • Example – music player interface • I’d like to hear Beethoven’s 5th • Please play Brandenburg Concertos by Bach • Play something by Elvis