310 likes | 487 Views
Natural Language Processing . CIS 479/579 Bruce R. Maxim UM-Dearborn. Eliza. In 1966 Weizenbaum developed a program that simulates the behavior of a Rogerian, non-directive, psychotherapist The program seemed to be able to understand anything typed in by the user. Eliza Demo. Eliza.
E N D
Natural Language Processing CIS 479/579 Bruce R. Maxim UM-Dearborn
Eliza • In 1966 Weizenbaum developed a program that simulates the behavior of a Rogerian, non-directive, psychotherapist • The program seemed to be able to understand anything typed in by the user
Eliza • The program was actually fairly “dumb” in modern AI terms • Its “understanding” was the result of programming trickery • Its weaknesses were caused by relying almost exclusively on the premise that the syntax of a sentence captured its semantic meaning
Eliza Algorithm • Keep track of the two most recent entries from the user • Remove all punctuation from these entries and check for duplicate entries • Make some synonym replacements from a list of pairs (e.g. big for huge) • Change pronouns (e.g. I and me to you)
Eliza Algorithm • Search for keywords in the edited entries • if a keyword is found copy everything following the key word from the user’s entry • If no keywords are found then generate a non-committal response
Eliza Algorithm • The nature of the non-committal response depends on whether a stored concept (e.g. “mother” or “hate”) exists if stored concept X exists then 2/5 of time “let’s discuss X” 3/5 of time “earlier you said X” else give some response like “I see”
Mechanical Translation • Based on syntax (surface structure) of sentence and has no real understanding of sentence meaning (semantics) • Cold War attempts failed with idiomatic speech “the spirit is willing but the flesh is weak” “time flies like an arrow” “fruit flies like a banana”
Issues in UnderstandingNatural Language • Large amounts of human knowledge is assumed by the person generating a sentence • Language is pattern-based, phonemes, words, and sentences can not be randomly ordered in normal communication • Language acts are agent-based and agents are embedded in complex social environments
Language Components • Prosody • Deals with the rhythm and intonation of language, hard to formalize • Phonology • Examines sounds combined to form language, important to computerized speech recognition and generation • Morphology • Components that makeup words, prefixes, suffixes, word tense, word number, parts of speech
Language Components • Syntax • Rules for combining words into sentences and the use of these rules to parse and generate sentences • Most easily formalized and so far the most successfully automated component of linguistic analysis • Semantics • Considers meaning of words and sentences and ways meaning is conveyed in natural language expressions
Language Components • Pragmatics • Study of ways in which language is used and its effects on the listener (e.g. the reason “yes” is not a good answer to “do you know what time it is?”) • World knowledge • Includes knowledge of physical world, the conventions of social discourse, and the role of intentions in communication
Stages of Language Analysis • Parsing • Analyzes the syntactic structure of sentences (e.g. identifies components and makes sure sentences are well formed) • Often builds a parse tree • Semantic interpretation • Produces a representation of the meaning of the sentence often represented as semantic network or conceptual graph • May also use frames, conceptual dependency, or predicate logic
Stages of Language Analysis • World knowledge representation • Structures from the knowledge base are used to augment the internal representation of the sentence to allow meaning to be more accurately inferred • Note: these phases are not purely sequential and may proceed concurrently in many systems
Syntax • Many parsers are built assuming that the production rules can be expressed as a context free grammar s ::= np vp np ::= noun | art noun vp ::= verb | verb np art ::= a | the noun ::= man | dog verb ::= likes | bites
Parse Tree s np vp v np art n art noun bites man the the dog
Transition Network Parsers • Represents a grammar as a set of finite state machines or transition networks • Each network corresponds to a single non-terminal in a grammar • Arcs are labeled with either terminal or non-terminal symbols • Each path through the network corresponds to a rule for that non-terminal
Transition Network Parsers • Finding a successful path through the network corresponds to a replacement nonterminal with the RHS of the rule • The parser must find a path from the start symbol to the final symbols • Terminals must match exactly • The network pieces are assembled until the entire sentence is represented
Part of Transition Network np vp sinit sfinal art noun sinit sfinal noun np v sfinal sinit verb
Chomsky Hierarchy • Regular Grammars • Can be recognized using finite state machines • Rules cannot have more than one non-terminal on the right-hand side • Are not powerful enough to represent even programming language syntax
Chomsky Hierarchy • Context-free Grammars • Rules have only one non-terminal on their left hand side • Rules can have more than one non-terminal on the right-hand side • Can have recursive rules • Can be parsed by transition network parsers or pushdown autonoma
Chomsky Hierarchy • Context-Sensitive Grammars • Allow more than one non-terminal on their left hand side • Make it possible to define a context in which the rule can be applied • Must be non-erasing (e.g. RHS is never shorter than LHS) • Can be recognized using linear bounded automaton (e.g. Turing machine with finite tape)
Chomsky Hierarchy • Unrestricted or Recursively Enumerable Grammar • There are no restrictions • Can be recognized using Turing machine with infinite tape • Not very useful for defining the syntax of natural language in AI programming
Context-Sensitive Grammars • Increase the number of rules and non-terminals in a grammar • Obscure the phrase structure of the language more so than context-free rules • By attempting to add more checks for things like agreement and semantic consistency, they lose some of the separation between syntax and semantics
Context-Sensitive Grammars • They still do not address the problem needing to build a semantics representation of the sentence • Parser only accepts or rejects based on syntax of the sentence itself
Augmented Transition Network Parser • Add a set of registers to regular transition network to give it the ATN the ability to store partially developed parse trees • Allow conditional execution of arcs (e.g. test before calling) • Attach actions to nodes capable of modifying the data structures returned • In short, the recognizer becomes a full Turing machine
Combining Syntax and Semantic Knowledge • The semantic interpreter constructs its interpretation by beginning with the root node of the structure returned by the ATN parser and traverses the “parse tree” • At each node the semantic interpreter interprets the children recursively and combines the results in a single conceptual graph that is passed up the parse tree • The semantic interpreter makes use of a domain specific knowledge base to build this conceptual graph
Natural Language Processing Applications • Natural language queries against relational databases • Improving free text web searches • Natural language report generators whose input is data files or reports • Realistic adventure game dialogs • News filters and surveillance tools