260 likes | 405 Views
Instance Based Approach. KNN Classifier. Simple classification technique. Handed an instance you wish to classify Look around the nearby region to see what other classes are around Whichever is most common—make that the prediction. K-nearest neighbor.
E N D
Instance Based Approach KNN Classifier
Simple classification technique • Handed an instance you wish to classify • Look around the nearby region to see what other classes are around • Whichever is most common—make that the prediction Instance Based Classification
K-nearest neighbor • Assign the most common class among the K-nearest neighbors (like a vote) KNN Classifier Instance Based Classification
How Train? Don’t Instance Based Classification
Let’s get specific • Train • Load training data • Classify • Read in instance • Find K-nearest neighbors in the training data • Assign the most common class among the K-nearest neighbors (like a vote) Euclidean distance: a is an attribute (dimension) Instance Based Classification
How find nearest neighbors • Naïve approach: exhaustive • For the instance to be classified • Visit every training sample and calculate distance • Sort • First K in the list Voting Formula Where is ’s class, if ; 0 otherwise Euclidean distance: a is an attribute (dimension) Instance Based Classification
Classifying a lot of work • The Work that Must be Performed • Visit every training sample and calculate distance • Sort • Lots of floating point calculations • Classifier puts-off work till time to classify Euclidean distance: a is an attribute (dimension) Instance Based Classification
Lazy • This is known as a “lazy” learning method • If do most of the work during the training stage known as “eager” • Our next classifier, Naïve Bayes, will be eager • Training takes a while but can classify fast • Which do you think is better? Lazy vs. Eager Where the work happens Training or Classifying Instance Based Classification
Book mentions KD-Tree From Wikipedia: space‑partitioning data structure for organizing points in a k‑dimensional space. kd‑trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). kd-trees are a special case of BSP trees. Instance Based Classification
If use some data structure … • Speeds up classification • Probably slows “training” Instance Based Classification
How choose K? • Choosing K can be a bit of an art • What if you could include all data-points (K=n)? • How might you do such a thing? Weighted Voting Formula Where , and is “1” if it is a member of class (i.e. where returns the class of ) How include all data points? What if weighted the votes of each training sample by its distance from the point being classified? Instance Based Classification
Weight Curve • 1 over distance squared • Could get less fancy and go linear • But then training data very-far-away still have strong influence Instance Based Classification
Could go more fancy • Other Radial Basis Functions • Sometimes known as a Kernel Function • One of the more common Instance Based Classification
Issues • Work back-loaded • Worse the bigger the training data • Can alleviate with data structures • What else? Other Issues? What if only some dimensions contribute to ability to classify? Differences in other dimensions would put distance between that point and the target. Instance Based Classification
Curse of dimensionality • Book calls this the curse of dimensionality • More is not always better • Might be identical in important dimensions but distant in others From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.) Instance Based Classification
Gene expression data • Thousands of genes • Relatively few patients • Is there a curse? Instance Based Classification
Can it classify discrete data? • Bayesian could • Think of discrete data as being pre-binned • Remember RNA classification • Data in each dimension was A, C, U, or G Representation becomes all important If could arrange appropriately could use techniques like Hamming distances How measure distance? A might be closer to G than C or U (A and G are both purines while C and U are pyrimidines). Dimensional distance becomes domain specific. Instance Based Classification
First few records in the training data See any issues? Hint: think of how Euclidean distance is calculated Another issue Should really normalize the data For each entry in a dimension Instance Based Classification
Other uses of instance based approaches Why average? • Function approximation • Real valued prediction: take average of nearest k neighbors • If don’t know the function and/or it is too complex to “learn”, just plug-in a new value the KNN classifier can “learn” the predicted value on the fly by averaging the nearest neighbors Instance Based Classification
Regression • Choose an m and b that minimizes the squared error • But again, computationallyHow? m and b that minimize Instance Based Classification
Other things that can be learned • If want to learn an instantaneous slope • Can do local regression • Get the slope of a line that fits just the local data Instance Based Classification
The How: Big Picture • For each of the training datum we know what Y should be • If we have a randomly generated m and b, these, along with X will tell us a predicted Y • Know whether the m and b yield too large or too small a prediction • Can nudge “m” and “b” in an appropriate direction (+ or -) • Sum these proposed nudges across all training data Line represents output or predicted Y Target Y too low Instance Based Classification
Gradient Descent • Which way should m go to reduce error? Rise y actual b y actual Could Average Then do same for b Then do again Instance Based Classification
Back to why we went down this road • Locally weighted linear regression • Would still perform gradient descent • Becomes a global function approximation Instance Based Classification
Summary • KNN highly effective for many practical problems • With sufficient training data • Robust to noisy training • Work back-loaded • Susceptible to dimensionality curse Instance Based Classification