270 likes | 464 Views
CS60057 Speech &Natural Language Processing. Autumn 2005. Lecture 2 22 July 2005. Today’s slides adapted from Ilyas Cicekli’s slide http://www.cs.ucf.edu/~ilyas/Courses/CAP6640 Martin & Jurafsky’s book. Text Books.
E N D
CS60057Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005 Natural Language Processing
Today’s slides adapted from Ilyas Cicekli’s slide http://www.cs.ucf.edu/~ilyas/Courses/CAP6640 Martin & Jurafsky’s book Natural Language Processing
Text Books • Daniel Jurafsky, and James H. Martin, "Speech and Language Processing", Prentice Hall, 2000. Other References • James Allen, "Natural Language Understanding", Second edition, The Benjamin/Cumings Publishing Company Inc., 1995 • Christopher D. Manning, and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999. Natural Language Processing
Why Should You Care? • Trends • An enormous amount of knowledge is now available in machine readable form as natural language text • Human-computer communication is becoming increasingly in vogue – dialog based systems. Question answering, conversational agents, Machine translation Information Retrieval, Summarization, Fusion, Natural Language Processing
Ambiguity I made her duck. • How many different interpretations does this sentence have? • What are the reasons for the ambiguity? • The categories of knowledge of language can be thought of as ambiguity resolving components. • How can each ambiguous piece be resolved? • Does speech input make the sentence even more ambiguous? • Yes – deciding word boundaries Natural Language Processing
Ambiguity • Some interpretations of : I made her duck. • I cooked duck for her. • I cooked duck belonging to her. • I created a toy duck which she owns. • I caused her to quickly lower her head or body. • I used magic and turned her into a duck. • duck – morphologically and syntactically ambiguous: noun or verb. • her – syntactically ambiguous: dative or possessive. • make – semantically ambiguous: cook or create. • make – syntactically ambiguous: • Transitive – takes a direct object. => 2 • Di-transitive – takes two objects. => 5 • Takes a direct object and a verb. => 4 Natural Language Processing
Why is NLP difficult? • Because Natural Language is highly ambiguous. • Syntactic ambiguity • The president spoke to the nation about the problem of drug use in the schools from one coast to the other. • has 720 parses. • Ex: • “to the other” can attach to any of the previous NPs (ex. “the problem”), or the head verb 6 places • “from one coast” has 5 places to attach • … Natural Language Processing
Ambiguity in a Bengali/Hindi Sentence • Give examples Some interpretations of: Morphological Ambiguity: Semantic Ambiguity: Natural Language Processing
Why is NLP difficult? • Word category ambiguity • book -->verb? or noun? • Word sense ambiguity • bank --> financial institution? building? or river side? • Words can mean more than their sum of parts • make up a story • Fictitious worlds • People on mars can fly. • Defining scope • People like ice-cream. • Does this mean that all (or some?) people like ice cream? • Language is changing and evolving • I’ll email you my answer. • This new S.U.V. has a compartment for your mobile phone. • Googling, … Natural Language Processing
Resolve Ambiguities • We will introduce models and algorithms to resolve ambiguities at different levels. • part-of-speech tagging -- Deciding whether duck is verb or noun. • word-sense disambiguation -- Deciding whether make is create or cook. • lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. • syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. Natural Language Processing
Resolve Ambiguities (cont.) I made her duck S S NP VP NP VP I V NP NP I V NP made her duck made DET N her duck Natural Language Processing
Dealing with Ambiguity • Three approaches: • Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. • Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. • Syntax proposes/semantics disposes approach • Probabilistic approaches based on making the most likely choices Natural Language Processing
Models to Represent Linguistic Knowledge • Different formalisms (models) are used to represent the required linguistic knowledge. • State Machines -- FSAs, HMMs, ATNs, RTNs • Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. • Logic-based Formalisms -- first order predicate logic, some higher order logic. • Models of Uncertainty -- Bayesian probability theory. Natural Language Processing
Algorithms to Manipulate Linguistic Knowledge • We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. • Most of the algorithms we will study are transducers and parsers. • These algorithms construct some structure based on their input. • Since the language is ambiguous at all levels, these algorithms are never simple processes. • Categories of most algorithms that will be used can fall into following categories. • state space search • dynamic programming Natural Language Processing
Language and Intelligence Turing Test ComputerHuman Human Judge • Human Judge asks tele-typed questions to Computer and Human. • Computer’s job is to act like a human. • Human’s job is to convince Judge that he is not machine. • Computer is judged “intelligent” if it can fool the judge • Judgment of intelligence is linked to appropriate answers to questions from the system. Natural Language Processing
NLP - an inter-disciplinary Field • NLP borrows techniques and insights from several disciplines. • Linguistics: How do words form phrases and sentences? What constraints the possible meaning for a sentence? • ComputationalLinguistics: How is the structure of sentences are identified? How can knowledge and reasoning be modeled? • ComputerScience: Algorithms for automatons, parsers. • Engineering: Stochastic techniques for ambiguity resolution. • Psychology: What linguistic constructions are easy or difficult for people to learn to use? • Philosophy: What is the meaning, and how do words and sentences acquire it? Natural Language Processing
Some Buzz-Words • NLP – Natural Language Processing • CL – Computational Linguistics • SP – Speech Processing • HLT – Human Language Technology • NLE – Natural Language Engineering • SNLP – Statistical Natural Language Processing • Other Areas: • Speech Generation, Text Generation, Speech Understanding, Information Retrieval, • Dialogue Processing, Inference, Spelling Correction, Grammar Correction, • Text Summarization, Text Categorization, Natural Language Processing
Some NLP Applications • Machine Translation – Translation between two natural languages. • Babel Fish translations system, Systran • Information Retrieval – Web search (uni-lingual or multi-lingual). • Query Answering/Dialogue – Natural language interface with a database system, or a dialogue system. • Report Generation – Generation of reports such as weather reports. • Other Applications – • Grammar Checking, Spell Checking, Spell Corrector Natural Language Processing
The Big Picture Source Language Speech Signal Target Language Speech Signal Speech recognition Speech Synthesis Target text Generation Source text Analysis Natural Language Processing
The Reductionist Approach Source Language Analysis Target Language Generation Text Normalization Text Rendering Morphological Analysis Morphological Synthesis POS Tagging Phrase Generation Parsing Role Ordering Semantic Analysis Lexical Choice Discourse Analysis Discourse Planning Natural Language Processing
Natural Language Understanding Words Morphological Analysis Morphologically analyzed words(another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation Natural Language Processing
Natural Language Generation Meaning representation Utterance Planning Meaning representations for sentences Sentence Planning and Lexical Choice Syntactic structures of sentences with lexical choices Sentence Generation Morphologically analyzed words Morphological Generation Words Natural Language Processing
Natural Language Generation • NLG is the process of constructing natural language outputs from non-linguistic inputs. • the reverse process of NL understanding. • A NLG system may have two main parts: • Discourse Planning -- what will be generated, • Surface Realization -- realizes a sentence from its internal representation. • Lexical Choice • selecting the correct words describing the concepts. Natural Language Processing
Machine Translation • Machine Translation -- converting a text in language A into the corresponding text in language B (or speech). • Different Machine Translation architectures: • interlingua based systems • transfer based systems • How to acquire the required knowledge resources such as mapping rules and bi-lingual dictionary? By hand or acquire them automatically from corpora. • Example Based Machine Translation acquires the required knowledge (some of it or all of it) from corpora. Natural Language Processing
Some statistics (old) • Business e-mail sent per day in the US: 2.1Billion • First class mail per year: 107 Billion • Text on Internet • (2/99): > 6TB • Current: ? • indexed: 16% (Lawrence and Giles, Nature 400, 1999) • Dialog (www.dialog.com): 9 TB • Average college library: 1 TB Natural Language Processing
Languages • Languages: 39,000 languages and dialects (22,000 dialects in India alone) • Top languages: • Chinese/Mandarin (885M), • Spanish (332M), • English (322M), • Bengali (189M), • Hindi (182M), • Portuguese (170M), Russian (170M), Japanese (125M) • Source: www.sil.org/ethnologue, www.nytimes.com • Internet: English (128M), Japanese (19.7M), German (14M), Spanish (9.4M), French (9.3M), Chinese (7.0M) • Usage: English (1999-54%, 2001-51%, 2003-46%, 2005-43%) • Source: www.computereconomics.com Natural Language Processing