570 likes | 771 Views
Machine Learning. Kenton McHenry, Ph.D. Research Scientist. Raster Images. image(234, 452) = 0.58. [ Hoiem , 2012]. Neighborhoods of Pixels. For nearby surface points most factors do not change much Local differences in brightness. [ Hoiem , 2012]. Features. Feature Descriptors.
E N D
Machine Learning Kenton McHenry, Ph.D. Research Scientist
Raster Images image(234, 452) = 0.58 [Hoiem, 2012]
Neighborhoods of Pixels • For nearby surface points most factors do not change much • Local differences in brightness [Hoiem, 2012]
Feature Descriptors • Shapes • Curves • Color • Mean • Distribution • Texture • Filter banks • Size • Statistics • Neighbors
Descriptors, Why? • Matching: Match an object in two images based on similar features • 3D Reconstruction: Stereopsis • Tracking: Follow an object in a video by following its features • Object Recognition: Find objects based on known features it will posses • Segmentation: Break an image up into more meaningful regions based on the seen features.
Object Recognition • Use a collection of features specific to an object to identify it in new images
Object Recognition • Examples! • We have example features from the object we wish to find • We also have example features of stuff that isn’t the object
Supervised Learning • Labeled data sets • FYI, this is a lot of work! • [GIMP Demo]
Supervised Learning • Labeled data sets • This is a lot of work! • The more images the better (100-1000) • Extract features • Features from labeled portions are positive examples • Features outside labeled portions are negative examples
Machine Learning • Matlab Statistics Toolbox
Decision Trees • Decide whether to wait for a table at a restaurant? • Input is a situation described by a set of properties • Feature Descriptor • Output is a decision • e.g. Yes or No [Lesser, 2010]
Decision Trees • Construct a root note containing all examples • Split nodes that have examples from more than one thing • Choose an attribute that best splits the data • Entropy(S) = -p+ log(p+) - p-log(p-) • Information gain • If nodes have only examples from one thing have them output the label of that • Greedy algorithm • Not necessarily optimal • But faster [Lesser, 2010]
Decision Trees [Lesser, 2010]
Decision Trees Height Eyes Hair [Lesser, 2010]
Decision Trees Hair Blond Dark Red Height Eyes [Lesser, 2010]
Decision Trees Hair Blond Red Dark Eyes Brown Blue [Lesser, 2010]
Decision Trees • We’ll use binary decision trees on continuous valued features • Instead of entropy select thresholds on each feature and see how many examples are correctly classified. • Pick the best feature threshold
Supervised Learning • I = imread(‘scar1.jpg’); • mask = imread(‘scar1_mask.png’) • mask = sum(mask, 3) • mask = mask > 0 • Ir = I(:,:,1); • Ig = I(:,:,2); • Ib = I(:,:,3); • Xp = [Ir(mask) Ig(mask) Ib(mask)]; • Xn = [Ir(~mask) Ig(~mask) Ib(~mask)]; • plot3(Xp(:,1),Xp(:,2),Xp(:,3),’r.’); • axis vis3d; • hold on; • plot3(Xn(:,1),Xn(:,2),Xn(:,3),’b.’);
Supervised Learning • X = double([Xp; Xn]); • Y = [repmat({‘1’}, size(Xp,1), 1); • repmat({‘-1’}, size(Xn,1), 1)]; • tree = classregtree(X,Y); • view(tree); • Y1 = t.eval(X); • sum(strcmp(Y,Y1)) / length(Y)
Supervised Learning • indices = rand(size(Xp,1),1)>0.5; • Xp1 = Xp(indices); • Xp2 = Xp(~indices); • indices = rand(size(Xn,1),1)>0.5; • Xn1 = Xn(indices); • Xn2 = Xn(~indices); • Xtrain = double([Xp1 Xn1]); • Ytrain= [repmat({‘1’}, size(Xp1,1), 1); • repmat({‘-1’}, size(Xn1,1), 1)]; • Xtest= double([Xp2 Xn2]); • Ytest= [repmat({‘1’}, size(Xp2,1), 1); • repmat({‘-1’}, size(Xn2,1), 1)];
Overfitting • Construct classifier to specifically to training data • Not generalizable • Split your labeled data into two data sets • Training set • Test set • Use test set to verify how well your constructed classifier generalizes to new unseen data
Supervised Learning • tree = classregtree(Xtrain,Ytrain); • view(tree); • Y = t.eval(Xtrain); • sum(strcmp(Y,Ytrain)) / length(Ytrain) • Y = t.eval(Xtest); • sum(strcmp(Y,Ytest)) / length(Ytest)
Supervised Learning • tree = classregtree(Xtrain,Ytrain, ‘splitmin’, 100); • view(tree); • Y = t.eval(Xtrain); • sum(strcmp(Y,Ytrain)) / length(Ytrain) • Y = t.eval(Xtest); • sum(strcmp(Y,Ytest)) / length(Ytest)
Supervised Learning • tp = sum(and(strcmp(Ytest,’1’),strcmp(Y,’1’))) • fp= sum(and(strcmp(Ytest,’-1’),strcmp(Y,’1’))) • tn= sum(and(strcmp(Ytest,’-1’),strcmp(Y,’-1’))) • fn= sum(and(strcmp(Ytest,’1’),strcmp(Y,’-1’))) • precision = tp / (tp + fp) • recall = tp / (tp + fn) • accuracy = (tp + tn) / (tp + tn + fp + fn)
Precision Recall Curves • Plot precision on y-axis and recall on x-axis as you alter parameters of classifier during training • Determine ideal parameters • Upper right corner • Compare classifiers • Area under curve Precision Recall
Well this did pretty good, right? • What must we keep in mind?
Supervised Learning • A wide variety of methods: • Naïve Bayes • Neural Nets • Nearest Neighbors • Support Vector Machines • …
Supervised Learning • A wide variety of methods: • Naïve Bayes • Probability of a given class given a feature descriptor • Treat features as independent of one another to make tractable • Neural Nets • Nearest Neighbors • Support Vector Machines • … http://en.wikipedia.org/wiki/Naive_bayes
Supervised Learning • A wide variety of methods: • Naïve Bayes • Neural Nets • Model after human brain • Network of neurons • Perceptron • Binary classifier • Learning • Iteratively adapt weights based on error • Backpropogation for multilayer networks • Iteratively adapt weights based on error • Nearest Neighbors • Support Vector Machines • … http://en.wikipedia.org/wiki/Perceptron
Supervised Learning • A wide variety of methods: • Naïve Bayes • Neural Nets • Nearest Neighbors • Look at nearby examples in feature space • Support Vector Machines • …
Supervised Learning • A wide variety of methods: • Naïve Bayes • Neural Nets • Nearest Neighbors • Support Vector Machines • Find a plane that divides the data so as to maximize the gaps between different things • … http://en.wikipedia.org/wiki/Support_Vector_Machines
Unsupervised Learning • No labeled data
Unsupervised Learning • No labeled data • Instead find hidden structure in the feature space • No error to evaluate by
Kmeans • Set the number of groups to look for, k • Assume groups are circular • Assume things can only belong to one group • Find center positions for each group so as to minimize: http://en.wikipedia.org/wiki/Kmeans
Kmeans • Solver iteratively • Randomly set the positions (i.e. means) for the k groups • Assign each feature descriptor (i.e. point) to the nearest group • Calculate the mean of the assigned points to a group and use as the new center position • Repeat
Now what? • We can use this to group together similar stuff • Things that may belong together according to the feature space • We can assign new data to the nearest group • Don’t know what that group is but do know it is similar to this other data • Can label groups manually
Unsupervised Learning • I = imread(‘scar1.png’) • Ir = I(:,:,1); • Ig = I(:,:,2); • Ib = I(:,:,3); • X = double([Ir(:) Ig(:) Ib(:)]); • [indices,centers] = kmeans(X,2); • image(uint8(reshape(C(1,:),1,1,3))); • image(uint8(reshape(C(2,:),1,1,3))); • indices • imagesc(reshape(indices==1,size(I,1),size(I,2)); • imagesc(reshape(indices==2,size(I,1),size(I,2));
Unsupervised Learning • A wide variety of methods: • Gaussian Mixture Models • Hierarchical Agglomerative Clustering • Principal Component Analysis • …
Unsupervised Learning • A wide variety of methods: • Gaussian Mixture Models • Kmeans is a restricted version of this • Gaussian distributions • No circular group assumption • Data can belong to more than one group • Weighted • Similar algorithm • Assign weights to each point for each group • Calculate group position as mean and standard deviation • Tends to suffer from numerical issues in practice • Hierarchical Agglomerative Clustering • Principal Component Analysis • …
Unsupervised Learning • A wide variety of methods: • Gaussian Mixture Models • Hierarchical Agglomerative Clustering • Create groups as nodes within a tree over the data • Construction • Find nearest to points and merge • Create a new point as the average of those points • Place original points as children of this new node • Remove original points from future consideration • Repeat • Costly to build • Efficiently search large amounts of data • Principal Component Analysis • … http://en.wikipedia.org/wiki/Hierarchical_clustering
Unsupervised Learning • A wide variety of methods: • Gaussian Mixture Models • Hierarchical Agglomerative Clustering • Principal Component Analysis • Identify most significant dimensions of the data • Most likely not lined up with coordinate axes • Use as a vocabulary to define data • May not require less dimensions • Compression • … http://en.wikipedia.org/wiki/Principal_component_analysis