90 likes | 121 Views
Machines are expected to automate about 25% of jobs across the globe in the next ten years. The number signifies the growing importance of algorithms that enable machines to learn and perform a variety of tasks – from simple to complex – for different purposes.
E N D
Machines are expected to automate about 25% of jobs across the globe in the next ten years. The number signifies the growing importance of algorithms that enable machines to learn and perform a variety of tasks – from simple to complex – for different purposes. Here is our pick of the top ten machine learning algorithms that a data scientist should know. 1. Naïve Bayes Classifier The is a simple classifying algorithm that separates one kind of data from another. For instance, spam filters use this algorithm to separate genuine mails from potentially spammy ones. The algorithm identifies features that denote the likelihood or probability that data is of a type – in this case, spam.
2. K Means Clustering This algorithm groups similar-seeming data into distinct clusters. It is useful for programs like search engines that can throw up numerous results for any search term. For example, a search for “uber” could potentially display results for the taxi service company, food that the same company delivers, or quite simply dictionaries that define the meaning of the word. Using this algorithm, search engines can display all pages on Uber cabs once it figures out you’re looking for information about the taxi service.
3. Support Vector Machine (SVM) SVMs are useful for identifying correlations between two sets of information. For example, if a person’s proficiency in mathematics is related to their proficiency in statistics, then the SVM can predict who will do well in statistics by observing math scores 4. Apriori This algorithm tries to predict the future using information from the past. E-commerce websites use it to recommend products based on a customer’s purchasing history.
5. Logistic Regression This type of algorithm is like the linear regression type. Both are predictive and correlate variables. The difference, however, is that logistic regression lists a range of possible outcomes, while linear regression predicts only one. 6. Linear Regression As explained in the section on statistics, linear regression is used to identify the relationship between dependent and independent variables. It is used to explain changes in x – the dependent variable - by tracing it back to changes in y – the independent variable. For instance, if an increase in investment in advertising results in a proportionate increase in revenue, the algorithm will suggest higher investment in advertising to increase revenue.
7. Artificial Neural Networks (ANNs) Based on biological neural networks, these algorithms are used to cluster and classify information, and to recognize patterns. Image recognition programs use this algorithm to typify features of images and recognize them in new data. 8. Decision Trees This type of algorithm is used to classify information and predict all possible outcomes according to classifications. For example, the answer to the question “Are you a data scientist?” could either be yes or no. If the answer is yes, we can use this algorithm to list all possible tasks the data scientist engages in to find out what tasks are most popular. If the answer is no, the algorithm could present a list of other occupations to determine what the individual does for a living.
9. Random Forests Many decision trees combine to form random forests. Random forests are detailed algorithms that accumulate decision trees to classify and correlate more information and predict more outcomes with greater accuracy 10. Nearest Neighbors This type of algorithm is often described as non-parametric and lazy, because it doesn’t make any assumptions about data or learn from it actively. Rather, it simply classifies new data by likening it to its nearest neighbor. For instance, if the data set is made of alphabets, a new element C would be closer to B than to A, assuming A and B are already introduced to the algorithm. Nearest neighbors algorithms are great for exploring random data sets with a large number of distinct values