170 likes | 215 Views
Machine Learning Models. http://vitalflux.com/cheat-sheet-10-machine-learning-algorithms-r-commands/. What To Know For Machine Learning. Overview Algorithms Prediction. Machine Learning.
E N D
Machine Learning Models http://vitalflux.com/cheat-sheet-10-machine-learning-algorithms-r-commands/
What To Know For Machine Learning • Overview • Algorithms • Prediction
Machine Learning • Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory inartificial intelligence. • Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are • Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. • Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning). • Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent.
Categorization • Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system: • In classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one (or multi-label classification) or more of these classes. This is typically tackled in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are "spam" and "not spam". • In regression, also a supervised problem, the outputs are continuous rather than discrete. • In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. • Density estimation finds the distribution of inputs in some space. • Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics.
Algorithms • Linear regression • Logistic Regression • K-Means Clustering • K-Nearest Neighbors (KNN) Classification • Naive Bayes Classification • Decison Trees • Support Vector Machine (SVM) • Artifical Neural Network (ANN) • Apriori • AdaBoost
Linear Regression • Linear regression: “lm” method from base package could be used for linear regression models. • Following is the sample command: • lm_model <- lm(y ~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)))
Logistic Regression • Logistic Regression: Logistic regression is a classification based model. “glm” method from base R package could be used for logistic regression. • Following is the sample command: • glm_model <- glm(y ~ x1+x2, family=binomial(link="logit"), data=as.data.frame(cbind(y,x1,x2))) • https://en.wikipedia.org/wiki/Logistic_regression • http://www.ats.ucla.edu/stat/r/dae/logit.htm
K-Means Clustering • K-Means Clustering: “kmeans” method from base R package could be used to run k-means clustering. • Following is a sample command given X is a data matrix and m is the number of clusters: • kmeans_model <- kmeans(x=X, centers=m) • http://www.r-statistics.com/2013/08/k-means-clustering-from-r-in-action/
K-Nearest Neighbours • K-Nearest Neighbors (KNN) Classification: “knn” method from “class” package could be used for K-NN modeling. One need to install and load “class” package. • Following is the sample command given X_train represents a training dataset, X_test represents test data set, k represents number of nearest neighbors to be included for the modeling • knn_model <- knn(train=X_train, test=X_test, cl=as.factor(labels), k=K) • https://www.datacamp.com/community/tutorials/machine-learning-in-r
Naive Bayes Classification • Naive Bayes Classification: “naiveBayes” method from “e1071” package could be used for Naive Bayes classification. One need to install and load “e1071” package prior to analysis. • Following is the sample command: • naiveBayes_model <- naiveBayes(y ~ x1 + x2, data=as.data.frame(cbind(y,x1,x2))) • https://eight2late.wordpress.com/2015/11/06/a-gentle-introduction-to-naive-bayes-classification-using-r/
Decision Trees • Decision Trees: “rpart” method from “rpart” can be used for Decision Trees. One need to install and load “rpart” package. • Following is the sample command: • cart_model <- rpart(y ~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)), method="class") • http://www.statmethods.net/advstats/cart.html
Support Vector Machine • Support Vector Machine (SVM): “svm” method from “e1071” package could be used for SVM. Note that the same package also provide method, naiveBayes, for Naive Bayes classification. One need to install and load “e1071” package. • Following is the sample command given X is the matrix of features, labels be the vector of 0-1 class labels, and C being regularization parameter. • svm_model <- svm(x=X, y=as.factor(labels), kernel ="radial", cost=C) • https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/SVM
Artificial Neural Network • Artifical Neural Network (ANN): “neuralnet” method from “neuralnet” package could be used for ANN modeling. • Following is sample command: • ann_model <- neuralnet( y ~ x1 + x2 + x3, data=as.data.frame(cbind(y,x1,x2, x3)), hidden = 1) • Prediction could be made using following formula: • p <- compute( ann_model, as.data.frame(cbind(x1,x2)) ) • https://beckmw.wordpress.com/tag/neural-network/
Apriori • Apriori: “apriori” method from “arules” package could be used for Apriori analysis. One need to install and load “arules” package. • Following is the sample command: • apriori_model <- apriori(as.matrix(sampleDataset), parameter = list(supp = 0.8, conf = 0.9)) • http://www.r-bloggers.com/association-rule-learning-and-the-apriori-algorithm/
AdaBoost • AdaBoost: “ada” method from “rpart” package could be used as boosting function. • Following is sample command: • boost_model <- ada(x=X, y=labels) • https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/adaboost
Prediction • For most of the above formulas including linear regression model, one could use following function to predict: • predicted_values <- predict(some_model, newdata=as.data.frame(cbind(x1_test, x2_test))) • http://www.dummies.com/how-to/content/how-to-predict-new-data-values-with-r.html • http://astrostatistics.psu.edu/datasets/R/html/stats/html/predict.lm.html