290 likes | 496 Views
Artificial Intelligence Communication by natural language. Fall 2008 professor: Luigi Ceccaroni. Communication. Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs.
E N D
Artificial IntelligenceCommunication by natural language • Fall 2008 • professor: Luigi Ceccaroni
Communication • Communication is the intentional exchange of information • brought about by the production and perception of signs drawn from a shared system of conventional signs. • What sets humans apart from other animals and machines is the complex system of structured messages known as natural language. • It enables us to communicate most of what we know about the world. 2
Natural language processing • In contrast with formal languages, natural languages, such as Spanish, French and English, have no strict definition. • They are used by a community of speakers. • Natural language processing (NLP) treats natural languages as if they were formal languages • to build computational systems able to understand and generate human language in all its forms. 3
Understanding speech acts • The action of producing language is called speech act. • The problem of understanding speech acts is much like other understanding problems • such as understanding images or diagnosing illnesses. • We are given a set of ambiguous inputs, • from them we have to work backwards to decide what state of the world could have created these inputs. 4
Fundamentals of language • A formal language is defined as a (possibly infinite) set of strings. • Each string is a concatenation of terminal symbols, sometimes called words. • Formal languages such as first-order logic and Java have strict mathematical definitions. • A grammar is a finite set of rules that specifies a language. 5
Fundamentals of language • Formal languages always have an official grammar, specified in some document. • Natural languages have no official grammar. • Linguists strive to discover properties of the language and then to codify their discoveries in a grammar. • To date, no linguist has succeeded completely. 6
Fundamentals of language • Linguists attempt to define a language as it is. • Prescriptive grammarians try to dictate how a language should be. • They create rules which are sometimes printed in style guides, but have little relevance to actual language usage. 7
Fundamentals of language • Both formal and natural languages associate a meaning or semantics to each valid string. • In natural languages, it is also important to understand the pragmatics of a string: • the actual meaning of the string as it is spoken in a given situation: • There are very different ways to say “please”. • The meaning is not just in the words themselves, but in the interpretation of the words in situ. 8
Fundamentals of language • Most grammar rule formalisms are based on the idea of phrase structure: • Strings are composed of substrings called phrases, which come in different categories. • Examples of the category noun phrase, or NP: • “the king” • “the agent in the corner” 9
Fundamentals of language • Phrases usually correspond to natural semantic elements • from which the meaning of an utterance can be constructed; for example: • Noun phrases refer to objects in the world. • Categorizing phrases helps us to describe the allowable strings of the language. • Any of the noun phrases can combine with a verb phrase (or VP) such as “is dead” to form a phrase of category sentence (or S).
Fundamentals of language • Without the intermediate notions of NP and VP, it would be difficult to explain why “the king is dead” is a sentence whereas “king the dead is” is not. • Category names such as NP, VP and S are called nonterminal symbols. • Grammars define nonterminals using rewrite rules: S → NP VP An S may consist of any NP followed by any VP.
Levels of analysis in NLP • Lexico-morphological • Detecting lexical units and their morphological information • Syntactic • Checking if a sentence is syntactically valid • Semantic • Extracting global meaning from individual meanings and from relations • Pragmatic • Relating a sentence to the line of discussion • Illocutive • Relating a sentence to intentions
Problems in NLP: examples • Lexical ambiguity • “reinventing the front wheel” • “wheel” can be a noun or a verb (part-of-speech tagging or POS-tagging) • “she saw the bank” • Building of a financial institution? Sloping land? Supply held in reserve for future use? (word sense disambiguation or WSD)
Problems in NLP: examples • Syntactic ambiguity • “He saw a man on the mountain top with binoculars” • Who’s got the binoculars? • “The seller of newspapers of the neighborhood” • What is the prepositional-phrase attached to? (prepositional-phrase attachment or PP-attachment)
Problems in NLP: examples • Semantic ambiguity • “He gave the children a cake” • A cake in total or one to each child? (scope of the quantification) • “Colorless green ideas sleep furiously” • Sentence composed by Noam Chomsky in 1957 as an example of a sentence whose grammar is correct but whose meaning is nonsensical. • It was used to show inadequacy of the then-popular probabilistic models of grammar, and the need for more structured models.
Problems in NLP: examples • References, ellipsis, pragmatics • “She gave him a book” • "We gave the monkeys the bananas because they were hungry“ • "We gave the monkeys the bananas because they were over-ripe" • Same surface grammatical structure. However, the pronoun they refers to monkeys in one sentence and bananas in the other, and it is impossible to tell which without a knowledge of the properties of monkeys and bananas.
Problems in NLP: examples • Illocution (Where is the stress? What intentions?) • "I never said she stole my money" - Someone else said it, but I didn't. • "I never said she stole my money" - I simply didn't ever say it. • "I never said she stole my money" - I might have implied it in some way, but I never explicitly said it. • "I never said she stole my money" - I said someone took it; I didn't say it was she. • "I never said she stole my money" - I just said she probably borrowed it. • "I never said she stole my money" - I said she stole someone else's money. • "I never said she stole my money" - I said she stole something, but not my money.
Statistical natural-language processing • It uses stochastic, probabilistic and statistical methods to resolve some of the difficulties discussed above, especially those which arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. • Methods for disambiguation often involve the use of corpora and Markov models. • Statistical NLP comprises all quantitative approaches to automated language processing, including probabilistic modeling and information theory. • The technology for statistical NLP comes mainly from machine learning and data mining, both of which are fields of artificial intelligence that involve learning from data.
Major tasks and applications in NLP • Automatic summarization • Foreign language reading aid • Foreign language writing aid • Information extraction • Information retrieval (IR) • IR is concerned with storing, searching and retrieving information. • It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). • Some current research and applications seek to bridge the gap between IR and NLP.
Major tasks and applications in NLP • Machine translation • Automatically translating from one human language to another. • Named entity recognition (NER) • Given a stream of text, determining which items in the text map to proper names, such as people or places. • Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.
Major tasks and applications in NLP • Natural language generation • Natural language understanding • Optical character recognition (OCR) • Question answering • Given a human language question, the task of producing a human-language answer. • The question may be a closed-ended (such as "What is the capital of Canada?") or open-ended (such as "What is the meaning of life?").
Major tasks and applications in NLP • Speech recognition • Given a sound clip of a person or people speaking, the task of producing a text dictation of the speakers. • (The opposite of text to speech.) • Spoken dialogue system • Text simplification • Text-to-speech • Text-proofing
Resources • Natural language processing (in Spanish) [http://es.geocities.com/lenguajenatural/] • Introductory book [http://www.gelbukh.com/clbook/] • Resources for text, speech and language processing [http://www.cs.technion.ac.il/~gabr/resources/resources.html] • Natural language processing blog [http://nlpers.blogspot.com/]
Resources • About Opinion, Language, and Blogs [http://opinlab.wordpress.com/] • A comprehensive list of resources, classified by category [http://www.proxem.com/] • ACL Wiki for natural language processing and computational linguistics [http://aclweb.org/aclwiki/index.php?title=Main_Page]
Research and development groups • IBM NLP Research Area [http://domino.watson.ibm.com/comm/research.nsf/pages/r.nlp.html] • Microsoft Research: NLP [http://research.microsoft.com/nlp/] • Language Technologies Institute at Carnegie Mellon University [http://www.lti.cs.cmu.edu/] • Natural Language Group at the Information Sciences Institute [http://www.isi.edu/natural-language/] • Natural Language Generation Group at the Open University [http://mcs.open.ac.uk/nlg/]
Research and development groups • Survey of the State of the Art in Human Language Technology [http://cslu.cse.ogi.edu/HLTsurvey/] • University of Edinburgh Natural Language Processing Group [http://www.iccs.informatics.ed.ac.uk/] • Natural Language and Information Processing Group at the University of Cambridge [http://www.cl.cam.ac.uk/research/nl/] • Stanford Natural Language Processing Group [http://nlp.stanford.edu/] • UPC center for research and technology development on language and speech processing (TALP) [http://www.talp.cat/talp/]