Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques

Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques George Boulougaris, Kostas Kolomvatsos, and Stathes Hadjiefthymiades Pervasive Computing Research Group, Department of Informatics and Telecommunications University of Athens, Greece WCCI – IJNN 2010 Barcelona - Spain

Outline • Introduction • Market Members • Scenario • Buyer Q-Table • Buyer Purchase Behavior • Results

Introduction • Intelligent Agents • Autonomous software components • Represent Users • Learn from their owners • Electronic Markets • Places where entities not known in advance can negotiate over the exchange of products • Reinforcement Learning • General framework for sequential decision making • Leads to the maximum long-term reward at every state of the world

Market Members • Buyers • Sellers • Middle entities (matchmakers, brokers, market entities) Intelligent agents may represent each of these entities • Entities do not have any information about the rest in the market

Scenario (1/2) • Buyers: • could interact with sellers • could interact with brokers or matchmakers (matchmakers cannot sell products) • want to buy the most appropriate product in the most profitable price • We focus on the interaction between buyers and selling entities (sellers or brokers) • Most of the research efforts focus only on the reputation of entities • We utilize Q-Learning that is appropriate to result actions that lead to the maximum long-term reward (based on a number of parameters) at every state of the world

Scenario (2/2) • The products parameters for each selling entity are: • ID • Time validity • Price • Time availability • Relevance • Each selling entity represents the state that the buyer is

Buyer Q-Table (1/3) • The buyer has one Q-Table for each product • Rows represent states and columns represent actions • There are M+1 rows and columns (M is the number of selling entities) • Actions [1..M] represent the transition to the [1..M] entity (row of the Q-Table) • Action M+1 represent the purchase action (from the specific entity) • The transition to another entity corresponds to a ‘not-buy-from-this-entity’ action

Buyer Q-Table (2/3) • The buyer takes into consideration the following information in order to build the Q-Table: • Relevancy factor • Price • Response time • Number of transitions • The equation used is: where l is the learning rate, r is the reward, γ is the future reward discount factor, st and at is the state and the action at the time t

Buyer Q-Table (3/3) • Issues concerning the reward: • has 5% decrement when deals with entities not having the product • is based on: • the reward for the relevancy • the reward for the price • the reward for the response time • the reward for the required transitions • the greater the relevancy is the greater the reward becomes • the smaller the price is the greater the reward becomes • the smaller the response time is the greater the reward becomes • the smaller the number of transitions is the greater the reward becomes

Buyer Purchase Behavior • The buyer is based on the Q-Table for the purchase action • There are two phases in its behavior • First Phase • It creates the Q-Table • It uses a specific number of episodes in the training phase • Second Phase • It utilizes the Q-Table for its purchases • At first randomly selects an entity (row) for a specific product • Accordingly selects the action with the highest reward • If the best action is to return to a previous visited entity with inability to deliver, the purchase is not feasible

Results (1/4) • We consider a dynamic market where the number and the characteristics of entities are not static • In our experiments we take into consideration the following probabilities: • 2% that a new product is available in an entity • 5% that a product is totally new in the market • 5% that a product is no longer available in an entity • 2% that an entity is totally new in the market • 1% that an entity is not able anymore for negotiations • We examine the purchases of 400 products in each experiment

Results (2/4) • Tables creation time results

Results (3/4) • Q-Learning reduces the required purchase steps

Results (4/4) • Q-Learning reduces the average price and the average response time as the number of entities increases • Q-Learning does not affect basic parameters as the number of products increases

Thank you http://p-comp.di.uoa.gr

Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques