630 likes | 1k Views
Natural Language in AI. Outline . Text-based natural language Dialogue-based natural language. Methods in Natural Language Processing. Methods in NLP can be oriented to two categories of tasks: NL generation NL understanding. Natural Language problems. dialogue-based NL interfaces
E N D
Outline • Text-based natural language • Dialogue-based natural language
Methods in Natural Language Processing Methods in NLP can be oriented to two categories of tasks: • NL generation • NL understanding
Natural Language problems • dialogue-based • NL interfaces • spoken and written communication • uses natural language understanding • discourse (any string more than 1 Sentence long) • text-based • text categorization, text generation, information extraction, machine translation
Text-based NL problems • story/text understanding; • information extraction: extracting information from text; • translating documents, manuals, communications; • drafting documents; • summarizing texts; • text generation, categorization or clustering, text DB retrieval, text mining, topic identification;
Text-basedNatural Language Topics • Information extraction • Machine translation • Drafting • Text summarization
Information Extraction • Extracting specific types of information from large volumes of unrestricted text; • The IE system must be input with domain guidelines that specify what to find and what to extract; • They seek for the portions that might contain the relevant information intended. • IE systems are not required to understand completely the text source;
Types of IE • Knowledge-based Information Extraction • Machine learning IE • Template-based, Wrappers • Template Mining
Types of IE Knowledge-based Information Extraction • Use of linguistic patterns to support the interpretation of input texts in knowledge-based information extraction. Machine learning IE • inductive learning mechanism to automatically construct a knowledge base of patterns.
Types of IE Template-based, Wrappers • IE’s output is a populated database, which can be used as a case base • The values for the slots are strings from the source text • The resulting database works as a template Template Mining • well suited for areas, “where the text is terse and sentences are unambiguous and declarative in nature”.
Relation between IE and NLP Using linguistic patterns: • knowledge-based (represents patterns) • inductive learning based (learns patterns) • template mining (skips parsing) • NLP is needed whenever there is need for disambiguating negation and ordering makes a difference in meaning
References of IE Robert Gaizauskas and Yorick Wilks (1998) Information Extraction: Beyond Document Retrieval. Computational Linguistics and Chinese Language Processing, vol. 3, no. 2, pp. 17-60. Riloff, E. Lehnert, W. (1994). Information Extraction as a Basis for High-Precision Text Classification. ACM Transactions in Information Systems, 12, 3, 296-333. Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J., Feng, F.,Dolan, C., and Goldman, S., (1993) UMASS/HUGHES: Description of the CIRCUS System Used for MUC-5. Proceedings of the Fifth Message Understanding Conference,pp. 277-291. San Mateo, CA:Morgan Kaufmann. S. Soderland and W. Lehnert (1994) Wrap Up: a Trainable Discourse Module for Information Extraction, Journal of Artificial Intelligence Research, 2, 131-168. Natural Language Processing Laboratory Online Information Extraction Bibliography online at: http://www-nlp.cs.umass.edu/ciir-pubs/tepubs.html
Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization
Can you translate this sentence? Ever since computers were invented, it has been natural to wonder whether they might be able to learn. By Tom Mitchell
List the words you used in the translated sentence and associate to the ones in the source sentence
Desde que computadores foram inventados tem sido natural imaginar que eles sejam capazes de aprender. Ever since computers were invented it has been natural to wonder whether they might be able to learn.
Online translators • http://babelfish.altavista.com/babelfish/trhttp://world.altavista.com/trhttp://www.systransoft.com/ • What’s wrong with them?
Can you translate this sentence? …cursing my head for things that I've said till I finally died, which started the whole world living…
What works? • The KANT project: • Knowledge-based, Accurate Translation for technical documentation • founded in 1989 • large-scale, practical translation systems • for technical documentation • Kant project homepage: http://www.lti.cs.cmu.edu/Research/Kant/
KANT • uses a controlled vocabulary and grammar for each language • explicit yet focused semantic models for each technical domain • achieves very high accuracy in translation • multilingual document production • has been applied to the domains • electric power utility management • heavy equipment technical documentation.
Machine Translation • Unrestricted MT is still inadequate. Will it ever change? • Why would MT target outperforming human translation? • An alternative is using humans to edit the original document into a subset of the original language (canonical form) Cost of MT • lexicons of 20,000-100,000 words • grammars with 100 to 10,000 rules
Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization
Drafting • applications in the legal domain • drafting of wills • petitions for restraining orders • use of rhetorical structure
Text-based Natural Language Topics • Information extraction • Machine translation • Drafting • Text summarization
Text summarization applications • Generate a summary of many documents; • Generate a summary of one document only; • Headline generation;
Text summarization The traditional idea of summarization is to extract sentences and concatenate them. Human beings produce summaries of documents by creating new sentences that capture the most salient pieces of information in the original document and that are grammatical, that cohere with one another, and . Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. From Knight, K. and Marcu, D. 2000.
Text summarization steps • Identify most relevant segments; • Apply rules for deleting redundant parts; • Compress/aggregate long sentences; • Assess coherence of segments; • Revise.
Dialogue-based natural language NL Understanding • Speech recognition • intonation, pronunciation, speed • Natural Language Processing • syntactic , semantic , pragmatic analysis Natural Language Generation • intention, generation, speech synthesis
Speech recognition • analog signal from voice is digitized • identify phonemes produced • template matching attempts to match phonemes from a library of sounds with sounds produced • outcome is a list of phonemes and probabilities • find the words using hidden Markov modeling
How to recognize speech How to wreck a nice beach Ice cream I scream
Speech Recognition Methods • speech recognition can also be implemented with an inductive method such as neural networks • individual and continuous recognizers • controlled vocabulary can increase chances of success e.g., Jupiter • limit to one speaker , when multiple speakers are needed, retraining may be often necessary • speech understanding includes speech recognition and understanding of the recognized utterance
Natural Language Understanding - Syntactic Analysis - Parsing - Semantics - Pragmatics
Syntactic analysis • a parser recovers the phrase structure of an utterance, given a grammar (rules of syntax) • parser’s outcome is the structure (groups of words and respective parts of speech) • phrase structure is represented in a parse tree • Parsing is the first step towards determining the meaning of an utterance
Parsing • Parsing: method to analyze a sentence to determine its structure according to the grammar • Grammar: formal specification of the structures allowable in the language
Examples of Symbols in a Grammar • (S) sentence • (NP) noun phrase • (VP) verb phrase • (PP) prepositional phrase • (RelClause) relative clause • (Det) determiner
Grammar rules S NP VP NP Det Adjective N S VP VP VP V Adjective S VP PP NP Adjective N S NP VP VP Dictionary entries: VP V S V ate VP V NP NAME John VP V PP Det(art) the NP Noun N cat PP P Noun NP Det Noun
S NP VP Article Noun Verb Adjective is insurmountable The terrain Parsing Tree
the outcome of the syntactic analysis can still be a series of alternate structures with respective probabilities • sometimes grammar rules can disambiguate a sentence, “John set the set of chairs” Sometimes they can’t. …the next step is semantic analysis
Semantic analysis • semantics provide a partial representation for meaning • represents the sentence in meaningful parts • uses possible syntactic structures and meaning • builds a parse tree with associated semantics • semantics typically represented with logic
Compositional semantics • The semantics of a phrase is a function of the semantics of its sub-phrases • It does not depend on any other phrase • So, if we know the meaning of sub-phrases, then we know the meaning of the phrases • “A goal of semantic interpretation is to find a way that the meaning of the whole sentence can be put together in a simple way from the meanings of the parts of the sentence.” (Alison, 1997 p. 112)
Semantic analysis • transitiveness of a verb enhances the meaning in a parse tree (e.g., jump is intransitive, love is transitive) -John died Mary Is there a period missing or is it: -John dyed Mary
Pragmatic analysis • uses context • uses partial representation • includes purpose and performs disambiguation • Where, when, by whom an utterance was said