Artificial Intelligence

Artificial Intelligence Neural Networks

What is Learning The word "learning" has many different meanings. It is used, at least, to describe • memorizing something • learning facts through observation and exploration • development of motor and/or cognitive skills through practice • organization of new knowledge into general, effective representations

Learning Study of processes that lead to self-improvement of machine performance. It implies the ability to use knowledge to create new knowledge or integrating new facts into an existing knowledge structure Learning typically requires repetition and practice to reduce differences between observed and actual performance

What is Learning? HerbertSimon: “Learning is any process by which a system improves performance from experience.”

Learning Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Learning & Adaptation • ”Modification of a behavioral tendency by expertise.” (Webster 1984) • ”A learning machine, broadly defined as any device whose actions are influenced by past experiences.” (Nilsson 1965) • ”Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” (Simon 1983)

Negative Features of Human Learning Its slow (5-6 years for motor skills 12-20 years for abstract reasoning) Inefficient Expensive There is no copy process Learning strategy is often a function of knowledge available to learner

Applications of ML Learning to recognize spoken words Learning to drive an autonomous vehicle Learning to classify objects Learning to play world-class backgammon Designing the morphology and control structure of electro-mechanical artefacts

Motivating Problems Handwritten Character Recognition

Motivating Problems Fingerprint Recognition (e.g., border control)

Motivating Problems Face Recognition (security access to buildings etc)

Different kinds of learning… • Supervised learning: • Someone gives us examples and the right answer for those examples • We have to predict the right answer for unseen examples • Unsupervised learning: • We see examples but get no feedback • We need to find patterns in the data • Reinforcement learning: • We take actions and get rewards • Have to learn how to get high rewards

Reinforcement learning • Another learning problem, familiar to most of us, is learning motor skills, like riding a bike. We call this reinforcement learning. • It's different from supervised learning because no-one explicitly tells you the right thing to do; you just have to try things and see what makes you fall over and what keeps you upright.

Learning with a Teacher desired response state x Environment Teacher actual response + Learning system - S error signal supervised learning knowledge represented by a set of input-output examples (xi,yi) minimize the error between the actual response of the learner and the desired response

Unsupervised Learning state Environment Learning system self-organized learning no teacher task independent quality measure identify regularities in the data and discover classes automatically

The red and the black • Imagine that we were given all these points, and we needed to guess a function of their x, y coordinates that would have one output for the red ones and a different output for the black ones.

What’s the right hypothesis? • In this case, it seems like we could do pretty well by defining a line that separates the two classes.

Now, what’s the right hypothesis • Now, what if we have a slightly different configuration of points? We can't divide them conveniently with a line.

Now, what’s the right hypothesis • But this parabola-like curve seems like it might be a reasonable separator.

Design a Learning System Learning System Z Step 0: • Lets treat the learning system as a black box

Design a Learning System 2 3 6 7 8 9 Step 1: Collect Training Examples (Experience). • Without examples, our system will not learn (so-called learning from examples)

X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,1,0,0,0,0) X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,0,0,0,1,0) Design a Learning System D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) Step 2: Representing Experience • So, what would D be like? There are many possibilities. • Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits

Example of supervised learning: classification • We lend money to people • We have to predict whether they will pay us back or not • People have various (say, binary) features: • do we know their Address? do they have a Criminal record? high Income? Educated? Old? Unemployed? • We see examples: (Y = paid back, N = not) +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N • Next person is +a, -c, +i, -e, +o, -u. Will we get paid back?

Learning by Examples Concept: ”days on which my friend Aldo enjoys his favourite water sports” Task: predict the value of ”Enjoy Sport” for an arbitrary day based on the values of the other attributes

Decision trees high Income? yes no Criminal record? NO yes no NO YES

+a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Constructing a decision tree, one step at a time address? yes no -a, +c, -i, +e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N criminal? criminal? no yes no yes -a, +c, -i, +e, -o, -u: N -a, +c, +i, -e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N income? Address was maybe not the best attribute to start with… no yes +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N

Different approach: nearest neighbor(s) • Next person is -a, +c, -i, +e, -o, +u. Will we get paid back? • Nearest neighbor: simply look at most similar example in the training data, see what happened there +a, -c, +i, +e, +o, +u: Y (distance 4) -a, +c, -i, +e, -o, -u: N (distance 1) +a, -c, +i, -e, -o, -u: Y (distance 5) -a, -c, +i, +e, -o, -u: Y (distance 3) -a, +c, +i, -e, -o, -u: N (distance 3) -a, -c, +i, -e, -o, +u: Y (distance 3) +a, -c, -i, -e, +o, -u: N (distance 5) +a, +c, +i, -e, +o, -u: N (distance 5) • Nearest neighbor is second, so predict N • k nearest neighbors: look at k nearest neighbors, take a vote • E.g., 5 nearest neighbors have 3 Ys, 2Ns, so predict Y

Neural Networks • They can represent complicated hypotheses in high-dimensional continuous spaces. • They are attractive as a computational model because they are composed of many small computing units. • They were motivated by the structure of neural systems in parts of the brain. Now it is understood that they are not an exact model of neural function, but they have proved to be useful from a purely practical perspective.

If…then rules If tear production rate = reduced then recommendation = none If age = young and astigmatic = no then recommendation = soft

Approaches to Machine Learning • Numerical approaches • Build numeric model with parameters based on successes • Structural approaches • Concerned with the process of defining relationships by creating links between concepts

Learning methods • Decision rules: • If income < $30.000 then reject • Bayesian network: • P(good | income, credit history,….) • Neural Network: • Nearest Neighbor: • Take the same decision as for the customer in the data base that is most similar to the applicant

Classification • Assign object/event to one of a given finite set of categories. • Medical diagnosis • Credit card applications or transactions • Fraud detection in e-commerce • Worm detection in network packets • Spam filtering in email • Recommended articles in a newspaper • Recommended books, movies, music, or jokes • Financial investments • DNA sequences • Spoken words • Handwritten letters • Astronomical images

Problem Solving / Planning / Control • Performing actions in an environment in order to achieve a goal. • Solving calculus problems • Playing checkers, chess, or backgammon • Balancing a pole • Driving a car or a jeep • Flying a plane, helicopter, or rocket • Controlling an elevator • Controlling a character in a video game • Controlling a mobile robot

Another Example:Handwriting Recognition • Background concepts: • Pixel information • Categorisations: • (Matrix, Letter) pairs • Both positive & negative • Task • Correctly categorise • An unseen example • Into 1 of 26 categories • Positive: • This is a letter S: • Negative: • This is a letter Z:

History • Roots of work on NN are in: • Neurobiological studies (more than one century ago): • How do nerves behave when stimulated by different magnitudes of electric current? Is there a minimal threshold needed for nerves to be activated? Given that no single nerve cel is long enough, how do different nerve cells communicate among each other? • Psychological studies: • How do animals learn, forget, recognize and perform other types of tasks? • Psycho-physical experiments helped to understand how individual neurons and groups of neurons work. • McCulloch and Pitts introduced the first mathematical model of single neuron, widely applied in subsequent work.

History • Widrow and Hoff (1960): Adaline • Minsky and Papert (1969): limitations of single-layer perceptrons (and they erroneously claimed that the limitations hold for multi-layer perceptrons) Stagnation in the 70's: • Individual researchers continue laying foundations • von der Marlsburg (1973): competitive learning and self-organization Big neural-nets boom in the 80's • Grossberg: adaptive resonance theory (ART) • Hopfield: Hopfield network • Kohonen: self-organising map (SOM)

Applications • Classification: • Image recognition • Speech recognition • Diagnostic • Fraud detection • … • Regression: • Forecasting (prediction on base of past history) • … • Pattern association: • Retrieve an image from corrupted one • … • Clustering: • clients profiles • disease subtypes • …

Real Neurons • Cell structures • Cell body • Dendrites • Axon • Synaptic terminals

Non Symbolic Representations • Decision trees can be easily read • A disjunction of conjunctions (logic) • We call this a symbolic representation • Non-symbolic representations • More numerical in nature, more difficult to read • Artificial Neural Networks (ANNs) • A Non-symbolic representation scheme • They embed a giant mathematical function • To take inputs and compute an output which is interpreted as a categorisation • Often shortened to “Neural Networks” • Don’t confuse them with real neural networks (in heads)

Complicated Example:Categorising Vehicles INPUT INPUT INPUT INPUT OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1 • Input to function: pixel data from vehicle images • Output: numbers: 1 for a car; 2 for a bus; 3 for a tank

Real Neural Learning • Synapses change size and strength with experience. • Hebbian learning: When two connected neurons are firing at the same time, the strength of the synapse between them increases. • “Neurons that fire together, wire together.”

Neural Network Input Layer Hidden 1 Hidden 2 Output Layer

X1 W1  f Inputs X2 W2 Output Wn Xn Simple Neuron

Neuron Model • A neuron has more than one input x1, x2,..,xm • Each input is associated with a weight w1, w2,..,wm • The neuron has a bias b • The net input of the neuron is n = w1 x1 + w2 x2+….+ wm xm + b

Neuron output • The neuron output is y = f (n) • f is called transfer function

Transfer Function • We have 3 common transfer functions • Hard limit transfer function • Linear transfer function • Sigmoid transfer function

Exercises • The input to a single-input neuron is 2.0, its weight is 2.3 and the bias is –3. • What is the output of the neuron if it has transfer function as: • Hard limit • Linear • sigmoid

Architecture of ANN • Feed-Forward networks Allow the signals to travel one way from input to output • Feed-Back Networks The signals travel as loops in the network, the output is connected to the input of the network

Learning Rule • The learning rule modifies the weights of the connections. • The learning process is divided into Supervised and Unsupervised learning

X1 W1  f Inputs X2 W2 Output Wn Xn Perceptron • It is a network of one neuron and hard limit transfer function

Artificial Intelligence