200 likes | 390 Views
Know Thy Neighbor: An Introduction to Scikit-learn and K-NN. Portia Burton Portland Data Science Group March 25, 2014. What We will Cover Today. 1. Define What is Machine Learning 2. Go Over Scikit-learn 3. Explain k-Nearest Neighbor 4. Demo of Scikit-learn and k-Nearest Neighbor .
E N D
Know Thy Neighbor: An Introduction to Scikit-learn and K-NN Portia Burton Portland Data Science Group March 25, 2014
What We will Cover Today 1. Define What is Machine Learning 2. Go Over Scikit-learn 3. Explain k-Nearest Neighbor 4. Demo of Scikit-learn and k-Nearest Neighbor
What is Machine Learning • The art of creating a predictive models • Uses input to make predictions • Enabling computers to pattern match data
What is scikit-learn? • Python machine learning package • Built on NumPy, SciPy, and matplotlib
k-NN • k Nearest Neighbor algorithm • The simplest machine learning algorithm • K being the constant
Basic Information about KNN • It is a lazy algorithm : doesn’t generalize the training data until approached with a new data point
Supervised Learning When your samples are labeled
Example: Spam Filters
Unsupervised Learning The given instances are not labeled, and the categories are determined independently
What can KNN be used for • Clustering • Regression
Downsides of KNN • Since there is minimum training there is a high cost in testing new data • Correlation is falsely high (data points can be given too much weight)
Alternatives to kNN KDTree BallTree
References: http://www.solver.com/xlminer/help/k-nearest-neighbors-prediction-example http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/ http://scikit-learn.org/stable/modules/neighbors.html http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html http://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning http://stackoverflow.com/questions/2620343/what-is-machine-learning