170 likes | 825 Views
Machine Learning. Usman Roshan Dept. of Computer Science NJIT. What is Machine Learning?. “ Machine learning is programming computers to optimize a performance criterion using example data or past experience. ” Intro to Machine Learning, Alpaydin, 2010 Examples: Facial recognition
E N D
Machine Learning UsmanRoshan Dept. of Computer Science NJIT
What is Machine Learning? • “Machine learning is programming computers to optimize a performance criterion using example data or past experience.” Intro to Machine Learning, Alpaydin, 2010 • Examples: • Facial recognition • Digit recognition • Molecular classification
A little history • 1946: First computer called ENIAC to perform numerical computations • 1950: Alan Turing proposes the Turing test. Can machines think? • 1952: First game playing program for checkers by Arthur Samuel at IBM. Knowledge based systems such as ELIZA and MYCIN. • 1957: Perceptron developed by Frank Roseblatt. Can be combined to form a neural network. • Early 1990’s: Statistical learning theory. Emphasize learning from data instead of rule-based inference. • Current status: Used widely in industry, combination of various approaches but data-driven is prevalent.
Example up-close • Problem: Recognize images representing digits 0 through 9 • Input: High dimensional vectors representing images • Output: 0 through 9 indicating the digit the image represents • Learning: Build a model from “training data” • Predict “test data” with model
Data model • We assume that the data is represented by a set of vectors each of fixed dimensionality. • Vector: a set of ordered numbers • We may refer to each vector as a datapointand each dimension as a feature • Example: • A bank wishes to classify humans as risky or safe for loan • Each human is a datapoint and represented by a vector • Features may be age, income, mortage/rent, education, family, current loans, and so on
Machine learning resources • Data • NIPS 2003 feature selection contest • mldata.org • UCI machine learning repository • Contests • Kaggle • Software • Python sci-kit • R • Your own code
Textbook • Not required but highly recommended for beginners • Introduction to Machine Learning by Ethem Alpaydin (2nd edition, 2010, MIT Press). Written by computer scientist and material is accessible with basic probability and linear algebra background • Applied predictive modeling by Kuhn and Johnson (2013, Springer). More recent book focuses on practical modeling.
Some practical techniques • Combination of various methods • Parameter tuning • Error trade-off vs model complexity • Data pre-processing • Normalization • Standardization • Feature selection • Discarding noisy features
Background • Basic linear algebra and probability • Vectors • Dot products • Eigenvector and eigenvalue • See Appendix of textbook for probability background • Mean • Variance • Gaussian/Normal distribution
Assignments • Implementation of basic classification algorithms with Perl and Python • Nearest Means • Naïve Bayes • K nearest neighbor • Cross validation scripts • Experiment with various algorithms on assigned datasets
Project • Some ideas: • Experiment with Kaggle and NIPS 2003 feature selection datasets • Experimental performance study of various machine learning techniques on a given dataset. For example comparison of feature selection methods with a fixed classifier.
Exams • One exam in the mid semester • Final exam • What to expect on the exams: • Basic conceptual understanding of machine learning techniques • Be able to apply techniques to simple datasets • Basic runtime and memory requirements • Simple modifications
Grade breakdown • Assignments and project worth 50% • Exams worth 50%