240 likes | 325 Views
M ☺ deling of User Behavior In Matching Task Based on Previous Reward History and Personal Risk Factor. April 1, 2004 Helen Belogolova Amy Daitch. Project Summary. Experiment: Subjects given matching task in which they choose between button A and B
E N D
M☺deling of User Behavior In Matching Task Based on Previous Reward History and Personal Risk Factor April 1, 2004 Helen Belogolova Amy Daitch
Project Summary • Experiment: • Subjects given matching task in which they choose between button A and B • Received reward based on predetermined reward functions • Our Model: • Subject’s memory decay: leaky integration • Personal Risk Factor • Cumulative Risk Factor
Method for Modeling the Behavior • General Method • Part I. Exploratory Phase • P(A) = 0.5, P(B) = 0.5 • First 10 trials have an equal probability of choosing A or B • Part II. Choices Based on Past Reward History • Reward function took into account 40 trial buffer updated after each trial • Vector of rewards weighted based on leaky integrator model with decay parameter d: weighted_rewards_vector = [exp(1*d) exp(2*d) … exp(240*d) ]’ * reward_vector • Most recent reward carries most influence on subject’s next move
Method for Modeling the Behavior • To choose between A and B we sum up the weighted rewards after A button presses (rewA) and B button presses (rewB) P(A) = rewA/(rewA+rewB) P(B) = 1-P(A) • Based on these total rewards the next choice is generated like this: if rand(1) < p(A) choice A else choice B
Method for Modeling the Behavior • Model Accounting for Risk • Risk = subject’s willingness to deviate from optimal choice based on past trials • Personality Risk • Constant in experiment, Range from 0 to 1 • Function of personality = willingness to take risks in general • Cumulative Risk, Range from 0 to 1 • Increases as the Cumulative Reward increases cumulative_risk(trial) = cumulative_reward(trial)/max_cumulative_reward • Maximum Cumulative Reward in our case was 6
Method for Modeling the Behavior • Weights of Personal Risk Factor and Cumulative Reward Risk Factor make up Total Risk Factor: total_risk = personal_risk*personal_risk_weight + cumulative_risk*cumulative_risk_weight • With the total risk parameter as above, the decisions are made like this and the choice of A or B is generated the same way as in the general model: p(A) = rewA/(rewA + rewB) – (rewA/(rewA + rewB) – 0.5)*2*total_risk p(B) = 1 – p(A)
Results • Ran experiment on model, varying one parameter at a time • Since stochastic decisions, ran experiment several times for each set of parameters to diminish the effects of randomness • A subject could produce somewhat different results if experiment done more than once = we ran the experiment on the model many times to see how a subject with certain characteristics would behave.
Results • We then plotted the ratio of the subject’s button press within the buffer vs. the trial number and observed that: • Varying only personal risk factor = most successful when risk factor very high or very low (same in this experiment) • Below: personal risk, cumulative risk = 0
Results • Varying only cumulative reward risk factor = more successful as cumulative reward risk increases • Below(cumulative risk = 0.25, personal risk = 0)
Results • Decay rates 0.5, 1.0, and 2.0 while keeping risk factor zero • At decay rate of 2.0 succeeded the most • At the decay rate of 0.5 had the least success. • This suggests that the most important rewards to remember are the ones in the immediate past
Comparison of Results With Real Data • Compared choices of our model with those of the tested subjects • Cross correlated the choice vector of the subject (real data) with the choice vectors we generated by our model for all of the variations
Comparison of Results With Real Data • Observed strong correlation between our subjects and the models with very high personal risk factors and very low personal risk factors (below: p-risk, c-risk = 0)
Comparison of Results With Real Data • For the cumulative reward risk parameter we found that as it increased, with personal risk constant at zero, the correlation improved (below: cumulative risk = 0.25)
Comparison of Results With Real Data -Changing the decay rate in the model didn’t appear to affect correlation between model and subject generated data (decay rate = 1)