230 likes | 308 Views
Natural Language Processing. Artificial Intelligence CMSC 25000 February 28, 2002. Agenda. Why NLP? Goals & Applications Challenges: Knowledge & Ambiguity Key types of knowledge Morphology, Syntax, Semantics, Pragmatics, Discourse Handling Ambiguity
E N D
Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002
Agenda • Why NLP? • Goals & Applications • Challenges: Knowledge & Ambiguity • Key types of knowledge • Morphology, Syntax, Semantics, Pragmatics, Discourse • Handling Ambiguity • Syntactic Ambiguity: Probabilistic Parsing • Semantic Ambiguity: Word Sense Disambiguation • Conclusions
Why Language? • Natural Language in Artificial Intelligence • Language use as distinctive feature of human intelligence • Infinite utterances: • Diverse languages with fundamental similarities • “Computational linguistics” • Communicative acts • Inform, request,...
Why Language? Applications • Machine Translation • Question-Answering • Database queries to web search • Spoken language systems • Intelligent tutoring
Knowledge of Language • What does it mean to know a language? • Know the words (lexicon) • Pronunciation, Formation, Conjugation • Know how the words form sentences • Sentence structure, Compositional meaning • Know how to interpret the sentence • Statement, question,.. • Know how to group sentences • Narrative coherence, dialogue
Word-level Knowledge • Lexicon: • List of legal words in a language • Part of speech: • noun, verb, adjective, determiner • Example: • Noun -> cat | dog | mouse | ball | rock • Verb -> chase | bite | fetch | bat • Adjective -> black | brown | furry | striped | heavy • Determiner -> the | that | a | an
Word-level Knowledge: Issues • Issue 1: Lexicon Size • Potentially HUGE! • Controlling factor: morphology • Store base forms (roots/stems) • Use morphologic process to generate / analyze • E.g. Dog: dog(s); sing: sings, sang, sung, singing, singer,.. • Issue 2: Lexical ambiguity • rock: N/V; dog: N/V; • “Time flies like a banana”
Sentence-level Knowledge: Syntax • Language models • More than just words: “banana a flies time like” • Formal vs natural: Grammar defines language Recursively Enumerable =Any Chomsky Hierarchy Context = AB->BA Sensitive Context A-> aBc Free Regular S->aS Expression a*b*
Syntactic Analysis: Grammars • Natural vs Formal languages • Natural languages have degrees of acceptability • ‘It ain’t hard’; ‘You gave what to whom?’ • Grammar combines words into phrases • S-> NP VP • NP -> {Det} {Adj} N • VP -> V | V NP | V NP PP
Syntactic Analysis: Parsing • Recover phrase structure from sentence • Based on grammar S NP VP Det Adj N V NP Det Adj N The black cat chased the furry mouse
Syntactic Analysis: Parsing • Issue 1: Complexity • Solution 1: Chart parser - dynamic programming • O( ) • Issue 2: Structural ambiguity • ‘I saw the man on the hill with the telescope’ • Is the telescope on the hill?’ • Solution 2 (partial): Probabilistic parsing
Semantic Analysis • Grammatical = Meaningful • “Colorless green ideas sleep furiously” • Compositional Semantics • Meaning of a sentence is meaning of subparts • Associate semantic interpretation with syntactic • E.g. Nouns are variables (themselves): cat,mouse • Adjectives: unary predicates: Black(cat), Furry(mouse) • Verbs: multi-place: VP: x chased(x,Furry(mouse)) • Sentence ( x chased(x, Furry(mouse))Black(cat) • chased(Black(cat),Furry(mouse))
Semantic Ambiguity • Examples: • I went to the bank- • of the river • to deposit some money • He banked • at First Union • the plane • Interpretation depends on • Sentence (or larger) topic context • Syntactic structure
Pragmatics & Discourse • Interpretation in context • Act accomplished by utterance • “Do you have the time?”, “Can you pass the salt?” • Requests with non-literal meaning • Also, includes politeness, performatives, etc • Interpretation of multiple utterances • “The cat chased the mouse. It got away.” • Resolve referring expressions
Natural Language Understanding Meaning • Key issues: • Knowledge • How acquire this knowledge of language? • Hand-coded? Automatically acquired? • Ambiguity • How determine appropriate interpretation? • Pervasive, preference-based Input Tokenization/ Morphology Parsing Semantic Analysis Pragmatics/ Discourse
Handling Syntactic Ambiguity • Natural language syntax • Varied, has DEGREES of acceptability • Ambiguous • Probability: framework for preferences • Augment original context-free rules: PCFG • Add probabilities to transitions 0.2 NP -> N NP -> Det N NP -> Det Adj N NP -> NP PP 0.45 0.85 VP -> V VP -> V NP VP -> V NP PP S -> NP VP S -> S conj S 1.0 PP -> P NP 0.65 0.45 0.15 0.10 0.10 0.05
PCFGs • Learning probabilities • Strategy 1: Write (manual) CFG, • Use treebank (collection of parse trees) to find probabilities • Strategy 2: Use larger treebank (+ linguistic constraint) • Learn rules & probabilities (inside-outside algorithm) • Parsing with PCFGs • Rank parse trees based on probability • Provides graceful degradation • Can get some parse even for unusual constructions - low value
Parse Ambiguity • Two parse trees S S NP VP NP VP N V NP NP PP N V NP PP Det N P NP Det N P NP Det N Det N I saw the man with the telescope I saw the man with the telescope
Parse Probabilities • T(ree),S(entence),n(ode),R(ule) • T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007 • T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003 • Select T1 • Best systems achieve 92-93% accuracy
Semantic Ambiguity • “Plant” ambiguity • Botanical vs Manufacturing senses • Two types of context • Local: 1-2 words away • Global: several sentence window • Two observations (Yarowsky 1995) • One sense per collocation (local) • One sense per discourse (global)
Learn Disambiguators • Initialize small set of “seed” cases • Collect local context information • “collocations” • E.g. 2 words away from “production”, 1 word from “seed” • Contexts = rules • Make decision list= rules ranked by mutual info • Iterate: Labeling via DL, collecting contexts • Label all entries in discourse with majority sense • Repeat
Disambiguate • For each new unlabeled case, • Use decision list to label • > 95% accurate on set of highly ambiguous • Also used for accent restoration in e-mail
Natural Language Processing • Goals: Understand and imitate distinctive human capacity • Myriad applications: MT, Q&A, SLS • Key Issues: • Capturing knowledge of language • Automatic acquisition current focus: linguistics+ML • Resolving ambiguity, managing preference • Apply (probabilistic) knowledge • Effective in constrained environment