130 likes | 303 Views
Cobot: A Social Reinforcement Learning Agent. Presented By Deepali Abhyankar. Cobot. RL-based agent for LambdaMOO LambdaMOO – A complex, open-ended, multi-user chat environment
E N D
Cobot: A Social Reinforcement Learning Agent Presented By Deepali Abhyankar
Cobot • RL-based agent for LambdaMOO • LambdaMOO – A complex, open-ended, multi-user chat environment • Cobot – Interacts with the LambdaMOO users and learns to perform interesting and entertaining actions based on the users feedback
Actions performed by Cobot • Proposing conversation topics • Introducing new users • Engaging in common wordplay routines • To perform actions that seem meaningful, useful or amusing to the users.
Challenges • Choice of an appropriate State space • Multiple Reward sources • Inconsistency and drift of user rewards and desires • Variability in user understanding • Data sparsity • Irreproducibility of experiments
LambdaMOO • Oldest continuously operating MUD (Multi user Dungeons). • A series of interconnected rooms • Rooms populated with users and objects which move between them. • Users communicate through speech and Verbs • A large collection of verbs exists. (1) Buster is overwhelmed by all these deadlines. (2) Buster begins to slowly tear his hair out, one strand at a time. (3) HFh comforts Buster. (4) HFh [to Buster]: Remember, the mighty oak was once a nut like you. (5) Buster [to HFh]: Right, but his personal growth was assured. Thanks anyway, though. (6) Buster feels better now. • The objects are created by the users themselves who devise actions and and control access by other users.
Cobot • Cobot is a software agent that resides in lambda moo. • Connects via telnet. • Is a user with all the rights and responsibilities from the point of view of the LambdaMOO server. • Wanders into the Living Room, where he spends most of his time. • Notes the various events that occur here.
Functions performed by Cobot • Gathering and reporting social statistics. • Can search the web to answer specific questions posed to him. • Topic Change: Introduce a conversational topic. • Initiate a Roll call: someone who is tired of Monica Lewinsky may emote “TIRED OF LEWINSKY ROLL CALL.” Sympathetic users agree with the roll call. • Make a comment describing the current social state of the Living Room. • Introduce two users who have not yet interacted in front of Cobot.
RL State Features • Maintain separate state spaces • Cobot can be viewed as running a large number of separate RL processes in parallel, with each process having a different state space. • The state space for a user contains a number of features containing statistics about that particular user.
Each user’s state space is effectively infinite, as there are real-valued state features. • Linear function approximation is used for each user’s policy. • Cobot’s RL actions are chosen according to a mixture of the policies of the users present.