160 likes | 344 Views
Introduction to Machine Learning. Dmitriy Dligach. Representations. Objects Real-life phenomena viewed as objects and their properties (features) Feature Vectors Examples Text classification Face recognition WSD. Supervised Learning. Vector-value pair
E N D
Introduction to Machine Learning Dmitriy Dligach
Representations • Objects • Real-life phenomena viewed as objects and their properties (features) • Feature Vectors • Examples • Text classification • Face recognition • WSD
Supervised Learning • Vector-value pair • (x0, y0),(x1, y1), … , (xn, yn) • Task: learn function y = f(x) • Algorithms • KNN • Decision Trees • Neural Networks • SVM
Issues in Supervised Learning • Training data • Why are we learning? • Test data • Unseen data • Overfitting • Fitting noise reduces performance
Unsupervised Learning • Only feature vectors are given • x0, x1,…, xn • Task: group feature vectors into clusters • Algorithms • Clustering • k-means • mixture of gaussians • Principal Component Analysis • Sequence labeling • HMMs
Word Sense Disambiguation (WSD) • bat (noun) • http://wordnet.princeton.edu/perl/webwn • http://verbs.colorado.edu/html_groupings/
Another DT Example • Word Sense Disambiguation • Given an occurrence of a word, decide which sense, or meaning, was intended. • Example, run • run1: move swiftly ( I ran to the store.) • run2: operate (I run a store.) • run3: flow (Water runs from the spring.) • run4: length of torn stitches (Her stockings had a run.)
WSD • Word Sense Disambiguation • Categories • Use word sense labels (run1, run2, etc.) • Features – describe context of word • near(w) : is the given word near word w? • pos: word’s part of speech • left(w): is word immediately preceded by w? • etc.
Using a decision Tree • Given an event (=list of feature values): • Start at the root. • At each interior node, follow the outgoing arc for the feature value that matches our event • When we reach a leaf node, return its category. run4 pos verb noun “I saw John run a race by a river.” near(stocking) near(race) yes no yes no run1 near(river) yes run3 4pm
Unsupervised Example: K-Means • Distance between two objects • Cosine distance • Euclidean distance • Algorithm • Pick cluster centers at random • Assign the data points to the nearest clusters • Re-compute the cluster centers • Re-assign the data points • Continue until the clusters settle • Hard clustering vs. soft clustering
Interactive Demos • K-Means • http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html • SVMs • http://www.csie.ntu.edu.tw/~cjlin/libsvm/#GUI
ML Reference • Tom Mitchell “Machine Learning” • http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html • http://www.aaai.org/AITopics/html/machine.html