70 likes | 388 Views
Standard. FIFO. FILO. by:. and. Ah-Hwee Tan (asahtan@ntu.edu.sg) . Budhitama Subagdja (budhitama@ntu.edu.sg) . www.ntu.edu.sg. A Self-Organizing Neural Network Architecture for Intentional Planning Agents. Motivation
E N D
Standard FIFO FILO • by: • and • Ah-Hwee Tan(asahtan@ntu.edu.sg) • Budhitama Subagdja(budhitama@ntu.edu.sg) www.ntu.edu.sg A Self-Organizing Neural Network Architecture for Intentional Planning Agents Motivation Towards seamless integration of description-based (BDI) and soft-learning based agent architecture • fusion ART • Multi-channel growing neural network • Stable-Plastic control and operation • Mapping simple symbols (rules) to neural activations iFALCON Intentional, Plan-based agent architecture composed of layers of fusion ART • Resonance Search • Code Activation/Competition: • Template Matching: • Readout: • Template Learning: • Features • Deliberation and Plan selection • Sequential/hierarchical execution • Learning plan description • Online planning and learning • Deliberation/Execution Cycle • Critic evaluation: Beliefs and Desires are compared • Plan selection: search (resonance) a node in Plans field • Plan execution: active node (Plans) is readout to Sequencer and Action field • Subgoaling/Backtracking:activate a node in Buffer, and replace the values of Desires with subgoal. to backtrack, readout the highest in Buffer, and replace all corresponding fields • Planning/Learning • Allocating a new plan: A new node (plan) is created if no (applicable) plan can be found • Relaxed deliberation: The new node is stored in FILO and search another plan with reduced desires vigilance • Plan learning (hierarchical): A successful attempt of the (relaxed) plan activates the Sequencer following gradient encoding (FILO) after a backtracking. • As long as the learnt goal is not achieved, the above process repeats; if the learnt goal is achieved the connection between Plans and Sequencer is updated Gradient Encoding a real value indicates a time point or a position in an ordered sequence. To retrieve the sequence, larger activations represent codes occured earlier, hence recalled first • Case Study • Block world • Complement coding allows don't-care condition • Two plan set configurations: 1 complete plans (primitives and control); 2 incomplete plans (primitives only) Learnt plan with primitive plan + control plan {'goal': ['-bbtmC','-bupC'], 'pre': ['bbtmB', 'bbtmC', '-bbtmA', 'bupC', '-bupB', 'bupA'], 'body': [{'achieve': ['-bbtmB', '-bupC']}, {'achieve': ['-bbtmC', '-bupA']}], 'util': [1.0]} A Primitive plan {'goal': ['-bupB','-bbtmA'], 'pre': ['-bupA','bbtmA','bupB'], 'body': [{'do':['downA']}] 'util': [1.0]} with primitive plan only {'goal': ['bbtmB', '-bbtmC', 'bbtmA', 'bupC', 'bupB', '-bupA'], 'pre': ['bbtmB', 'bbtmC', '-bbtmA', 'bupC', '-bupB', 'bupA'], 'body': [{'achieve': ['-bupA']}, {'achieve': ['bbtmA', 'bupB']}, {'achieve': ['bupC']}, {'achieve': ['bbtmB', 'bupA']}, {'achieve': ['-bupC']}, {'achieve': ['bbtmB', 'bupC']}, {'achieve': ['bbtmA', 'bupB']}], 'util': [1.0]} The control plan {'goal': ['-bupA','bbtmA','bupB', 'bbtmB','bupC','-bbtmC'], 'pre': [], 'body': [{'achieve':['-bupC','-bbtmC']}, {'achieve':['bupC','bbtmB']}, {'achieve':['bupB','bbtmA']}] 'util': [1.0]} Presented at the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, May 10-15, 2009