Artificial Intelligence in Game Design

Artificial Intelligence in Game Design Lecture 24: Off-Line and Neural Network Learning in Games

Type of Learning in Games • On-Line learning • Takes place during game • Changes game parameters for this particular play of game • Must be fast and efficient • Simple hill climbing, N-Grams • Off-Line learning • Done during development stage of game • Used to set game parameters for final release of game • Can use complex forms of learning • Neural networks, Reinforcement learning

Game Parameters • Most complex games use continuous-valued parameters of some sort in decision making • Probabilities/Fuzzy measures • Coefficients in MinMax heuristics • How do we know what the best values are?

Learning Game Parameters • Learn game parameters from examples • Database of environmental inputs and desired character actions Degree of error between actual and desired action Current Parameters Actions indicated by rules Learning Element Determines how to change parameters in order to decrease error

Learning Game Parameters • Main difficulty: Acquiring examples of desired behavior to learn from • Most learning algorithms require thousands of examples • Must know desired action for each one • One solution: Let customers provide examples • Neural network learning in “twenty questions” • http://www.20q.net/ “Console version” marketed with final rules learned Server contains database of examples submitted by users Users interact with system on line Learning Element Adjusts knowledge to reduce error between its actions and user input

Learning Game Parameters • Can learn by repeatedly playing game against itself • Samuel’s checkers program • Early application of MinMax • Used up to 15 different weight values for heuristic evaluation of board • Basic idea: • Multiple versions of weights generated • All versions played against one another • Weights with most wins considered best

Artificial Neural Networks • Based on “structure” of brain • Only known working model of intelligence • Note that ANN is very rough approximation of little we know about brain • Main components: • Neural Units • Approximation of “neurons” • Roughly binary states • “on” or “off”, “1” or “0” • “active” or “inactive” • Connections • Approximation of “synapses” • Connect 2 neural units Connection Unit Unit

Artificial Neural Networks Connection • Connections have weights • Positive or negative numbers • Positive weight causes unit A to activate unit B • Negative weight causes unit A to deactivate unit B • Example: “mouse behavior” Weight W Unit A Unit B Smell cheese Positive weight Run towards smell Smell cat Negative weight

Network Layers • Input units • Correspond to sensory input (such as “smell cheese”) • Intermediate units • Used for complex interactions • Output units • Correspond to actions taken (such as “run towards smell”) • Human brain contains billions of neurons • Each connected to between 1000 – 10000 other units via synapses

Perceptron Networks • Early model (1960’s) • Very simple representation and learning • Fast and easy to implement • Too simple for some domains • Single layer of inputs, outputs • All units binary (value 0 or 1) • No intermediate units • Input and output layers interconnected with each other • Example: simple orc Hit points Run Have weapon Player strength Attack

Net Input Value • Step 1: Compute net input from inputs S1 – Sn to output Sj • Based on activation of input units S1 – Sn (0 or 1) • Based on weightsWij from inputs Si to Sj • netj = ΣSiWiji S1 W1j Ideas: Inactive units (with value 0) should have no effect on output Effect of active units should be proportional to the weight between that input and output … Wij Sj Si … Wnj Sn

Activation of Outputs • Threshold activation • netj > threshold for unit j Sj = 1 • netj≤ threshold for unit j Sj = 0 • Often represent threshold as another weightWbias • Negative weight from unit which is always active • Makes it easier to learn threshold Sj netj S1 threshold W1j … Wnj Sj Sn Wbias j Sbias (always 1)

Problem Format • Neural networks expect input activations in range 0 to 1 • May need to normalize data to get this • Example: orc • Hit points between 1 and 100  divide hit points by 100 • Have weapon: 1 if yes, 0 if no • Player strength between 10 and 20  subtract 10 and divide by 10 Hit points = 50 Have weapon = yes Player strength = 17.5 (0.5, 1.0, 0.75)

Orc Example Hit points = 0.5 -3 • netrun= (-1.5) + (-2) + 3 + (-2) = -2.5 Srun = 0 • netattack= 2 + 5 + (-2.25) + (-3) = 1.75 Sattack = 1 Run 4 -2 Have weapon = 1 5 4 Player strength = .75 Attack -3 -2 -3 Bias = 1

Perceptron Learning • Key question: Where do weights come from? • Too complex to hand code • Must learn from examples • Sample training set • Input values normalized between 0 and 1

Perceptron Learning • Given the following: • Sip: State of input i for example p • Sjp: Actual state of output j for example p • tjp: Desired state of output j for example p • How should the weight Wij be changed to decrease the error between Sjp and tjp ? Wij Sjp Sip

Perceptron Learning Key ideas: • If Sjpcorrect (Sjp == tjp), then no change to Wij • Not broken, so don’t fix! • If Sjp == 0 and tjp == 1, then netjp is too low • IncreaseWij • If Sjp == 1 and tjp == 0, then netjp is too high • DecreaseWij • If input value Sip == 0, then Wij had no effect on netjp • Only change Wij by the magnitude of Sip

Perceptron Learning • Perceptron learning rule:Wij = Wij + ηSip (tjp - Sjp ) • η = step size • Should be small (usually around 0.1)

Perceptron Learning • Learning involves cycling through training set many times • Often thousands of cycles • Start with randomly determined set of weights • Usually between -1 and 1 • While error left for some example pFor example p = 1 to number of examples For all inputs iAssign input i value Sip For all outputs j Compute actual output value Sjp For all weights Wij Apply learning rule to update weight

Limits of Perceptron Learning • Perceptron learning proven to correctly learn any behavior that can be represented with single layer of weights • Problem: Not all behaviors can be represented this way • Example: Exclusive-orA  B WA + WB + Wbias > 0 WA + Wbias < 0 WB + Wbias < 0 Wbias > 0 Contradiction A WA WB B Wbias bias

Limits of Perceptron Learning • Networks with at least two layers of weights proven to represent any behavior defined by some logical proposition • Problem: perceptron learning does not work • No “desired value” for intermediate units • Learning rule undefined without known values for units connected by weight

Back Propagation Learning • Most successful algorithm for multilayered neural networks • Based on gradient descent • Derivative of error with respect to change in weight • Ep = Σ (Skp - tkp )2all outputs k • Δ W = ∂ (Ep) / ∂ W • Requires activation function with continuous derivative • Threshold function has infinite derivative at threshold • Often use sigmoidal activation function 1/(1 + e-netj) • More biologically plausible anyway Sj netj

Back Propagation Learning Sjp Equations: • Weights Wjk between intermediate and output units:Δ Wjk = η(Skp - tkp ) Skp (1 - Skp) Sjp • Weights Wij between input and intermediate units:Δ Wij = ηΣ(Skp - tkp ) Skp (1 - Skp) Sjp Wjk Sjp (1 - Sjp) Sipk • Summed because Wjk affects all ouputs Wij Wjk Skp Sip

Neural Networks and Games • Perceptron learning often sufficient for game development • Only fails for nonlinear behaviors like XOR • Types of problem where multiple “good” things not good • High NPC hit points  attack • Low player strength  attack • High NPC hit points and low player strength  run? • Rare in practice • Works well even if results not perfect • Nonlinear aspects to behavior • Occasional bad data in training set • Still creates weights that work most of time • Advantage of most neural network algorithms

Neural Networks and Games What is being learned? • Relative importance of different inputs on player actions • Thresholds at which different values should affect behavior • Extremely hard to create a good set of rules by hand if large number of inputs and possible actions “Is my hit points or player strength more important in deciding when to attack? Hit points Have weapon “How low should I let my hit points get before running away? Player strength

Neural Networks and Games • Can also use learning to create variety of characters • Provide variety of experiences for players • Makes characters more realistic • Train each with slightly different set of examples • Much like how we learn from different experiences “I don’t know the meaning of fear (or many other words)!

Artificial Intelligence in Game Design