240 likes | 360 Views
Short Introduction to Machine Learning Instructor: Rada Mihalcea. Learning?. What can we learn from here? If Sky=Sunny and Air Temperature = Warm Enjoy Sport = Yes If Sky=Sunny Enjoy Sport = Yes If Air Temperature = Warm Enjoy Sport = Yes
E N D
Short Introduction to Machine Learning Instructor: Rada Mihalcea
Learning? • What can we learn from here? • If Sky=Sunny and Air Temperature = Warm Enjoy Sport = Yes • If Sky=Sunny Enjoy Sport = Yes • If Air Temperature = Warm Enjoy Sport = Yes • If Sky=Sunny and Air Temperature = Warm and Wind = Strong Enjoy Sport = Yes ??
What is machine learning? • (H.Simon) • “Any process by which a system improves performance” • (T.Mitchell) • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” • Machine Learning has to do with designing computer programs that improve their performance through experience
Related areas • Artificial intelligence • Probability and statistics • Computational complexity theory • Information theory • Human language technology
Applications of ML • Learning to recognize spoken words • SPHINX (Lee 1989) • Learning to drive an autonomous vehicle • ALVINN (Pomerleau 1989) • Learning to classify celestial objects • (Fayyad et al 1995) • Learning to play world-class backgammon • TD-GAMMON (Tesauro 1992) • Learning to translate between languages • Learning to classify texts into categories • Web directories
Main directions in ML • Data mining • Finding patterns in data • Use “historical” data to make a decision • Predict weather based on current conditions • Self customization • Automatic feedback integration • Adapt to user “behaviour” • Recommending systems • Writing applications that cannot be programmed by hand • In particular because they involve huge amounts of data • Speech recognition • Hand writing recognition • Text understanding
Terminology • Learning is performed from EXAMPLES (or INSTANCES) • An example contains ATTRIBUTES or FEATURES • E.g. Sky, Air Temperature, Water • In concept learning, we want to learn the value of the TARGET ATTRIBUTE • Classification problems. Binary case +/– positive/negative • Attributes have VALUES: • A single value (e.g. Warm) • ? - indicates any value possible for this attribute • - indicates that no value is acceptable. • All features in an example are sometimes referred to as FEATURE VECTOR
Terminology • Feature vector for our learning problem: • (Sky, Air Temp, Humidity, Wind, Water, Forecast) and the target attribute is EnjoySport. • How to represent Aldo enjoys sports only on cold days with high humidity • (?, Cold, High, ?, ?, ?) • How about Emma enjoys sports regardless of the weather? • Hypothesis = the entire set of vectors that cover given examples • Most general hypothesis • (?, ?, ?, ?, ?, ?) • Most specific hypothesis • (, , , , , ) • How many hypothesis can be generated for our feature vector ?
Task in machine learning • Given: • A set of examples X • A set of hypotheses H • A target concept c • Determine: • A hypothesis h in H such that h(x) = c(x) • Practically, we want to determine those hypotheses that would best fit our examples. • (Sunny, ?, ?, ?, ?, ?) Yes • (?, Warm, ?, ?, ?, ?) Yes • (Sunny, Warm, ?, ?, ?, ?) Yes
Machine learning applications • Until now: toy example, decide if X enjoys sport given the current and future forecast • Practical problems: • Part of speech tagging. How? • Word sense disambiguation • Text categorization • Chunking • . • . • Whatever problem that can be modeled through examples should support learning
Machine learning algorithms • Concept learning via searching on general-specific hypotheses • Decision tree learning • Instance based learning • Rule based learning • Neural networks • Bayesian learning • Genetic algorithms
Basic elements of information theory • How to determine which attribute is the best classifier? • Measure the information gain of each attribute • Entropy characterizes the (im)purity of an arbitrary collection of examples. • Given a collection S of positive and negative examples • Entropy(S) = - p log p – q log q • Entropy is at its maximum when p = q = ½ • Entropy is at its minimum when p = 1 and q = 0 • Example: • S contains 14 examples: 9 positive and 5 negative • Entropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94 • log 0 = 0
Basic elements of information theory • Information gain • Measures the expected reduction in entropy • Many learning algorithms are making decisions based on information gain
Decision trees • Have the capability of generating rules: • IF outlook=sunny and temperature = hot • THEN play tennis = no • Powerful! It would be very hard to do that as a human. • C4.5 (Quinlan) • ID3 • Integral part of MLC++ • Integral part of Weka (for Java)
Instance based algorithms • Distance between examples • Remember the WSD algorithm? • K-nearest neighbour • Given a set of examples X • (a1(x), a2(x) … an(x)) • Classify a new instance based on the distance between current example and all examples in training
Instance based algorithms • Take into account every single example: • Advantage? Disadvantage? • “Do not forget exceptions” • Very good for NLP tasks: • WSD • POS tagging
Measure learning performance • Error on test data • Sample error (generalization error): wrong cases / total cases • True error: estimate an error range starting with the sample error • Cross validation schemes – for more accurate evaluations • 10 fold cross validation scheme • Divide training data into 10 sets • Use one set for testing, and the other 9 sets for training • Repeat 10 times, measure average accuracy
Practical issues – Using Weka • Weka – freeware • Java implementation of many learning algorithms • + boosting • + capability of handling very large data sets • + automatic cross – validation • To run an experiment: • file.arff [test optional – if not present, will evaluate through cross-validation]
Specify the feature types • Specify the feature types: • Discrete: value drawn from a set of nominal values • Continuous: numeric value • Example : Golf data • Play, Don't Play. | the target attribute • outlook: sunny, overcast, rain. | features. • temperature: real. • humidity: real. • windy: true, false.
Weather Data • sunny, 85, 85, false, Don't Play • sunny, 80, 90, true, Don't Play • overcast, 83, 78, false, Play • rain, 70, 96, false, Play • rain, 68, 80, false, Play • rain, 65, 70, true, Don't Play • overcast, 64, 65, true, Play • sunny, 72, 95, false, Don't Play • sunny, 69, 70, false, Play • rain, 75, 80, false, Play • sunny, 75, 70, true, Play • overcast, 72, 90, true, Play • overcast, 81, 75, false, Play • rain, 71, 80, true, Don't Play
Running Weka • Check “Short Intro to Weka”