370 likes | 457 Views
LINKS TO AI. Natural Language Understanding (NLU). NLU systems have been built that exhibit a remarkable degree of practical utility (computerized flight-booking system). Growth in telecommunications arises the desire for NLU systems, i.e. online banking.
E N D
Natural Language Understanding (NLU) • NLU systems have been built that exhibit a remarkable degree of practical utility (computerized flight-booking system). • Growth in telecommunications arises the desire for NLU systems, i.e. online banking. • There are difficulties in understanding a natural language sentence: • Saying the same thing in two different ways. • With-clause role • Many words have multiple functions. • Word order.
Properties of NLU • NLU depends heavily on the availability of a large database of world knowledge. • Needs modification to interpretations of sentences in the light of further information. (Jane hit the boy with the ball.) • The implementation of NLU on a computer is broken down into number of levels: • Syntactic level. • Semantic level. • Pragmatic level.
Syntactic Level • Syntactic level is concerned with the way words are structured into phrases and how phrases are structured into sentences. • Grammar rules are typically used to check that a sentence is allowed.
Semantic and Pragmatic Levels • They are to do with meaning. • At semantic level, the propositional content is extracted. For example, John kicked the ball. Did John kick the ball? Both sentences have the same propositional content which in prolog notation can be expressed as: kick(john, ball)
Semantic and Pragmatic Levels • Pragmatic analysis deals with the interpretation of a sentence in context. The ‘He’ in the following sentence can only be resolved in context, that is, using the surrounding sentences. He scored a goal. • The intention of a speaker is also context-dependent. Can you make a cup of tea?
Syntactic Analysis • A piece of text is composed of sentences, and each sentence is composed of phrases which may contain sub-phrases that eventually terminate as words. • An example grammar: S -> NP VP|NP V S: sentence NP -> D AP|D N|NP PP D: determiner PP -> P NP NP:Noun phrase VP -> V NP|V PP VP:Verb phrase AP -> A AP|A N AP:Adjective phrase PP:Preposition phrase
Syntactic Analysis • A sentence is parsed to extract the phrase structure and to test that the sentence conforms to the grammar. • A parse can be viewed as a search where back-up states are maintained on a stack so that a recovery from a dead end can be made. • Parsing is also a basic requirement for many aspects of computing, i.e., compiling a program, checking the syntax of a database query etc.
The symbol connectionist link • Neural networks learn a task by adapting to a fed stimuli. • A system based on learning has the opportunity to induce knowledge automatically and to discover task-specific knowledge that cannot easily be prescribed by a set of rules. i.e., how to ride a bicycle. • A net provides a graded response and graded responses seem more appropriate for everyday tasks. • ANN can still perform a task even if some part of the neural architecture is damaged.
The symbol connectionist link • Neural nets can be massively parallel architectures. • We communicate using symbols and much knowledge is conveyed in the process of this communication. • Knowledge expressed as rules can be very helpful in various situations. • Knowledge communicated in symbol form can also speed up learning.
The symbol connectionist link • Both the symbol and connectionist paradigms have attractive features for building intelligent systems. • Much research has been (and continues to be) carried out linking traditional AI or symbolic AI with neural nets. • A question of central importance to many connectionists is whether neural nets can be constructed to perform high-level cognitive tasks such as natural language processing and planning.
The symbol connectionist link • The features that are central to the success of the symbol paradigm are compositional structures, structure-sensitive processing and generalization. • Trees used in NLU are compositional structures. A complex task is tackled by decomposing the task into parts. • In a logical expression such as ~(P Λ Q) = (~P V~Q), if P and Q are replaced by R and S, the operation is still possible since it is the structural form what matters.
The symbol connectionist link • The arguments in love(animate, object) admit a whole host of objects as suitable constituents of relationship love (generalization). • Next we will concentrate on these three features of connectionist approaches which are seen as central to the symbolic domain.
Synthesizing symbols with neural networks • Connectionists have two different aims: • Harness the computational ability of neural networks and are not concerned with biological realism. • Use neural networks in an attempt to explain computational processes within the brain.
Example • Hayward, Tickle, and Diederich focus on a detailed analysis of rule representation in a single connectionist network. The task is to learn which combinations of noun-verb-noun are grammatical. • Rule representation in a connectionist network: Understanding if a sentence is grammatically correct by an ANN means you are applying a grammatical rule on a sentence by the ANN. Hence NLU is a field of testing rule representation capabilities of ANNs.
Synthesizing symbols with neural networks • Connectionist networks have been examined for spoken language analysis due to their support for learning and fault-tolerance.
Recursive Autoassociative Memory (RAAM) • Introduced by J. Pollack (1990). • The purpose of an RAAM is to provide a connectionist representation of symbol structures. • All nodes in a tree should have the same number of emergent branches – valences.
RAAM • An RAAM is an autoassociative backpropogation network. • The RAAM representation for a structure emerges from the activation values of the hidden layer. • RAAM can be thought of as providing two machines: 1) The first layer of weights provides a machine to construct a representation of any tree (encoder). 2) The second layer of weights provides a machine that takes a constructed representation and reconstructs its parts (decoder).
RAAM • Trees are recursive structures and RAAM forms its representations in a recursive manner. • D, A, N, V and P are terminal symbols and presented to the network as vectors. • D: Determiner A: Adjective N: Noun V: Verb P: Proposition • The input and output layers consist of fields. • The dimension of a field is determined by the number of bits of the symbol vectors. • For example D 10000 A 01000 N 00100 V 00010 P 00001
RAAM • The hidden layer provides a compressed representation because the number of units in this layer is less than the number of input units.
Connectionist representations • Local vs. distributed. Local: Each unit represents an item, i.e. dog, rabbit, wolf. Distributed : Each unit represents a feature, i.e. has-tail, likes-bones. Hence, concepts are represented by more than one unit.
Connectionist representations • Spatial preservation of structure RAAM appears to generate representations that convey structural information. A single tree will fall into a category according to the phrase rule that the tree conforms to. RAAM provides representations that reveal something of the character of the data: for example, grouping into nouns, verbs, etc., or grouping by meaning.
Connectionist representations • Context When Elman (1990) replaced the word “man” with “zog” in all 10000 sentences and presented these transformed sentences to a pretrained network, the word “zog” was shown to exhibit the same spatial relationship as the word “man”. Different representation for the same word appearing in different contexts.
Capabilities of connectionist representations compared to symbolic equivalents • Concatenation • Transforming propositions into their logical equivalents (not as powerful as symbolic transformation techniques), Niklasson and Sharkey (1997) experiments also show high level of generalization. • Understanding different contexts of the same word in different sentences. • Generalization. • Given a sentence and word category of its constituents a parse tree can be produced (RAAM + SRN). • Chalmers (1990) has shown that an active to passive transformation is possible using a connectionist system. He used an RAAM to represent the sentences of both forms and then a backpropogation network is trained to transform the active sentence into its passive form.
SRN for NLU • Elman (1990) has demonstrated that hierarchical categorization can emerge from a sequence. • In one experiment, he used a simple recurrent network to discover lexical classes from word order. • Each word was coded as a 1-bit-in-31 vector. • There were 150 context units. • 10000 two and three word sentences were formed. • There were 27 534 words in 10000 sentences. • The 27 534 words were concatenated to create a single (27 534 x 31) bit vector. • Each word was fed to the input layer of the SRN along with the previous context. • The task of the network was to predict the next word in the sequence. • There were only 29 unique words. • The 27 534 hidden vectors are expected to be unique because they are superimposed with context information. • A summary vector for each word was calculated by averaging all the hidden vectors activated by that word. • These 29 vectors were then gouped using a hierarchical cluster technique. The clustering showed spatial separation of verbs and nouns. It also revealed a hierarchical arrangement of word categories. • With RAAM style networks, representations can be formed which, if clustered, will reveal something of the character of the data: for example, grouping into nouns, verbs, etc., or grouping by meaning.
SRN for NLU • Conceptual inferencing where ‘chopstick’ is put in the same category as ‘spoon’ and ‘fork’ after presented with the following text: “They ate spinach with chopsticks. The chopsticks were thrown along with the rest of the cutlery into the dishwasher.” • When Elman replaced the word ‘man’ with ‘zog’ in all 10000 sentences and presented these transformed sentences to a pre-trained network, the new word ‘zog’ was shown to exhibit the same spatial relationship as the word ‘man’. • Networks that will produce a different representation for the same word appearing in different contexts. • A can of lemonade. • A bottle of lemonade.
SOM for NLU • Kohonen (1990) generated a set of three-word sentences using a template in a similar manner to Elman (1990). • These sentences were concatenated and a context for each word was defined by averaging over all of its immediate preceding and succeeding words. • A vector for each word was then generated by concatenating the word vector code with the surrounding context vector. • The resultant vectors were than used to trained a self organizing map. • The semantic map that emerged showed segregation into nouns, verbs and adverbs, and further semantic organization within these regions. • Kohonen points out: “We clearly see that the contexts “channel” the word items to memory positions whose arrangement reflects both grammatical and semantic relationships.”
Concatenation • Train a feedforward network that would take as input RAAM representations for BACK and ACHE and produce at its output layer an RAAM representation for BACKACKE.
Parsing • Reilly (1992) • Chalmers (1990) has shown that an active to passive transformation is possible using a connectionist system.
Script Processing with DYNASTY • DYNASTY (Dynamic Story Understanding System) devised by Lee et. al. (1990) is a modular connectionist system that takes as input a script based piece of text and produces as output a complete paraphrase. • Given the following input: John entered the Chart-House. John ate the steak. John left a tip. It produces the following output: John entered the Chart-House. The waiter seated John. The waiter brought the menu. John read the menu, John ordered steak. John ate the steak. John paid the bill. John left a tip. John left the Chart-House for home. • Dynasty centers on distributed semantic representations (DSRs) to represent concepts (for example, milk and man) and propositions (for example, man drinks milk with a straw). • DSRs are generated using extended RAAMs that are basically RAAMS with a global dictionary to store symbol-DSR pairs. • For example, the symbol ‘milk’ would be stored with its RAAM representation. • Also each of the prepositions that milk appears in will play a part in composing the representation of milk. • A number of modules cooperate to perform the task.
DYNASTY • Each proposition has a case structure. For example, ‘human ate food with utensil’ has the case structure: • AGENCT-ACT-OBJECT-INSTRUMENT • There is a two-way relationship between word-concepts and propositions. • Some of this dependence is captured in the way DSRs learn. • A concept encoding network is used to learn word-concepts and a proposition encoding network is used to learn propositions. • Both networks are RAAMs coupled to a global dictionary. • Dynasty contains a number of modules: 1) The DSR learner 2) Event encoder: Uses already obtained DSR representations. 3) Script recognizer: Recognize a script from a sequence of events. 4) The backbone generator: The input is a script type and the output is the full list of events in script-role form.
DYNASTY • Word concepts and propositions are presented to the network as triples. • A word concept is structured as: • (word-concept, case-role, proposition-label) • A proposition concept is structured as: • (proposition-label, case-role, word-concept)
DYNASTY • Suppose the word concept to be learnt is milk and milk appears in the following propositions: P1: The man drinks the milk with a straw. P2: The company delivers milk in a cartoon. P3: Humans get milk from cows. P4: The man eats bread with milk. Hence, the triples for the word-concept network are as follows: (milk OBJECT p1) (milk OBJECT p2) (milk OBJECT p3) (milk CO-OBJECT p4) And the triples for the proposition p1 are as follows: (p1 ACT drink) (p1 OBJECT drink) (p1 INSTRUMENT straw) (p1 AGENT man)
DYNASTY • Script roles: CUSTOMER, RESTAURANT-NAME, FOOD • Instances: John, Jack, Chart-House, Korean-Garden, steak, short-rib • Events: CUSTOMER entered RESTAURANT-NAME waiter seated customer waiter brought menu CUSTOMER read menu CUSTOMER ordered FOOD CUSTOMER ate FOOD CUSTOMER paid bill CUSTOMER left a tip CUSTOMER left RESTAURANT-NAME for home First step is to learn DSRs for all script roles, the instances and the other concepts (menu, entered, etc.)
DYNASTY process for producing the full paraphrase of a subset of text: • Parse the input text into event-triple form. • For each symbol in the event triple, look up the DSR pattern from the global dictionary. • Use the event encoder to construct event representations. • Use the script recognizer to detect the script type. • Generate the complete set of events using the backbone generator. • Decode each event from step 5 into event-triple form (using the event-encoder network, that is, the RAAM decoder). This process will break the event representation into its constituent case-roles and DSRs. • Do the script role binding. • Look up DSRs in the global dictionary and select their associated symbol. • Complete the paraphrase.
Other Issues • Symbol grounding problem • Cog