Instance-Based Learning

Instance-Based Learning

Content • Motivation • Eager Learning • Lazy Learning • Instance-Based Learning • k-Nearest Neighbour Learning (kNN) • Distance-Weighted k-NN • Locally Weighted Regression (LWR) • Case-Based Reasoning (CBR) • Summary

Motivation: Eager Learning • THE LEARNING TASK: Try to approximate a target function through a hypothesis on the basis of training examples • EAGER Learning:As soon as the training examples and the hypothesis space are received the search for the first hypothesis begins • Training phase:given: training examples hypothesis space Hsearch: best hypothesis • Processing phase:for every new instance return • Examples

Motivation: Lazy Algorithms • LAZY ALGORITHMS: • Training examples are stored and sleeping • Generalisation beyond these examples is postponed till new instances must be classified • Every time a new query instance is encountered, its relationship to the previously stored examples is examined in order to compute the value of the target function for this new instance

Motivation: Instance-Based Learning • Instance-Based Algorithms can establish a new local approximation for every new instance • Training phase:given:training sample • Processing phase:given:instancesearch: best local hypothesis return • Examples: • Nearest Neighbour Algorithm • Distance Weighted Nearest Neighbour • Locally Weighted Regression • ....

Motivation: Instance-Based Learning 2 • How are the instances represented? • How can we measure the similarity of the instances? • How can be computed?

Nearest Neighbour Algorithm • IDEA: All instances correspond to the points in the n-dimensional space .Assign the value of the next, neighboured instance to the new instance • REPRESENTATION:Let be an instance, where denotes the value of the r-th attribute of an instance x • TARGET FUNCTION:Discrete valued or real valued

Nearest Neighbour Algorithm 2 • HOW IS THE NEAREST NEIGHBOUR DEFINED :Metric as similarity measure • Minkowski Norm: where • Euclidean distance: • This algorithm never forms an explicit general hypothesis regarding the target function f

Nearest Neighbour Algorithm 3 • HOW IS FORMED? • Discrete target function: where V: set of s classes • Continuous target function: • Let the next neighbour of ==>

k-Nearest Neighbour • IDEA:If we choose k=1, then the algorithm assigns to the value where is the nearest training instance toFor larger values of k the algorithm assigns the most common value among the k nearest training examples • HOW CAN BE ESTABLISHED? where if and otherwise

k-Nearest Neighbour 2 • Example:1NN: 5-NN: Voronoi Diagram • Voronoi Diagram: The decision surface is induced by a 1-Nearest Neighbour algorithm for a typical set of training examples. The convex surrounding of each training example indicates the region of query points whose classification will be completely determined by the training example.

k-Nearest Neighbour 3 • REFINEMENT: The weights of the neighbours are taken into account relative to their distance to the query point. The farther a neighbour the less is its influence...whereTo accommodate the case where the query point exactly matches one of the training instances and the denominator therefore is zero, we assign to be in this case • Distance-weight for real-valued target function:

Remarks on k-Nearest Neighbour Algorithm • PROBLEM:The measurement of the distance between two instances considers every attribute. So even irrelevant attributes can influence the approximation. • EXAMPLE: n =20 but only 2 attributes are relevant • SOLUTION: Weight each attribute differently when calculating the distance between two neighbours: • stretching the relevant axes in Euclidian space: shortening the axes that correspond to less relevant attributes • lengthening the axes that correspond to more relevant attribute • PROBLEM: Determine which weight belongs to which attribute automatically? • Cross-validation • Leave-one-out

Remarks on k-Nearest Neighbour Algorithm 2 • ADVANTAGE: • The training phase is processed very fast • Can learn complex target function • Robust to noisy training data • Quite effective when a sufficiently large set of training data is provided • Under very general conditions holds:where P is the probability of the error • DISADVANTAGE: • Alg. delays all processing until a new query is received => significant computation can be required to process; efficient memory indexing • Processing is slow • Sensibility about escape of the dimensions • BIAS:Inductive bias corresponds to an assumption that the classification of an instance will be most similar to the classification of other instances that are nearby in Euclidean distance

Locally Weighted Regression • IDEA: Generalization of Nearest Neighbour Alg.It constructs an explicit approximation to f over a local region surrounding . It uses nearby or distance-weighted training examples to form the local approximation to f. • Local: The function is approximated based solely on the training data near the query point • Weighted: The construction of each training example is weighted by its distance from the query point • Regression: Means approximating a real-valued target function

Locally Weighted Regression • PROCEDURE: • Given a new query , construct an approximation that fits the training examples in the neighbourhood surrounding • This approximation is used to calculate , which is as the estimated target value assigned to the query instance. • The description of may change, because a different local approximation will be calculated for each instance

Locally Weighted Regression 2 • PROCEDURE: • Given new query , construct an approximation that fits the training examples in the surrounding neighbourhood • How can be calculated? • Linear function • Quadratic function • Multilayer neural network • ... • This approximation is used to calculate , which is the output of the estimated target value for the query instance . • The description of may be deleted, because a different local approximation will be calculated for every distinct query instance

Locally Weighted Linear Regression • Special case of LWR, simple computation • LINEAR HYPOTHESIS SPACE:where the rth attribute of x, x variable of the hypotheses space • Define the error criterion E in order to emphasize the fitting of the local training example • Minimise the squared error over just k nearest neighbours: • Minimise the squared error over the entire set D using some kernel function K to decrease this error based on the distance • Combine and

Locally Weighted Linear Regression 2 • The third error criterion is a good approximation to the second one and it has the advantage that computational costs are independent of the total number of training examples • If is chosen and the gradient descent rule is rederived (see NN)the following training rule is obtained

Evaluation Locally Weighted Regression • ADVANTAGE • Pointwise approximation of a complex target function • Earlier data has no influence on the new ones • DISADVANTAGE • The quality of the result depends on • Choice of the function • Choice of the kernel function K • Choice of the hypothesis space H • Sensibility against the relevant and irrelevant attributes

Case-Based Reasoning (CBR) • Instance-based methods and locally weighted regression: • lazy learning; • They classify new query instances by analysing similar instances and ignoring the very different ones • They represent instances as real-valued points in an n-dimensional Euclidian space • CBR: first two principles and instances are represented by using a richer symbolic description and the methods used to retrieval

Case-Based Reasoning 2 • Given: a new case (instance) • Search for relevant cases in the Case-Library • Select the best one from them • Derive a solution • Evaluate the found solution • Add the solved case in the Case-Library

Case-Based Reasoning 3 • HOW ARE THE INSTANCES REPRESENTED?complex logical relational description • Example ((user-complaint error53 on shutdown) (CPU-model Power PC) (operating-system Windows) (network connection PCIA) (memory 48meg) (installed-application Excel Netscape) (disk 1gig) (likely-causes ???)) • HOW CAN THE SIMILARITY BE MEASURED? • See Example CADET

CADET • Prototype example of case based reasoning systems • Assists in the conceptual design of simple mechanical devices, such as water faucets • It uses a library containing approximately 75 previous designs and design fragments to suggest a conceptual design to meet the specifications of the new design • Each instance <qualitative function, mechanical structure>is stored • New design problem: Specify desired function • Desired: Corresponding structure

CADET Example

CADET Example 2 • Searches for subgraph isomorphisms between the two function graphs, so that parts of a case can be found to match parts of the design specification • The system elaborates the original function specification graph in order to create functionally equivalent graphs that match still more cases • It uses general knowledge about physical influences to create these elaborated function graphs: rewrite rule:x is a universally quantified variable • Combination to gain new solution: based on the knowledge-based reasoning

Evaluation of CBR • ADVANTAGE: • Formation of autonomous thinking systems • ??? • DISADVANTAGE • Hierarchical system • Memory indexing • Syntactical similarity measurement • Possibility of incompability between two neighboured cases -> impossible combination • Evaluation of the recognised solution

Evaluation of Lazy Algorithms • DIFFERENCE TO EAGER LEARNING • Computational timeless during the training phaselonger during the classification • Classification:training samples always remain obtained compute an instance specification approximation • Generalization accuracylocal approximations are computed • Bias:consider the query instance when deciding how to generalize beyond the training data • PROBLEMS: • Efficiently labeling new instances • Determining an appropriate distance measure • Influence of irrelevant attributes

Summary • Lazy learning: Delay processing of training examples until they must label a new query instance. The result is several local approximations. • k-Nearest neighbour: An instance is a point in the n-dimensional Euclidean space. The target function value for a new query is estimated from the known values of the k nearest training examples. • Locally weighted regression: Explicit local approximation to the target function is constructed for each query instance (form: constant, linear,...) • Case-based reasoning: Instances are represented by complex logical description. A rich variety of methods is proposed for mapping from the training examples to the target function values for new instances.

Instance-Based Learning