400 likes | 1.09k Views
This Edureka Random Forest tutorial will help you understand all the basics of Random Forest machine learning algorithm. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts, learn random forest analysis along with examples. Below are the topics covered in this tutorial: <br><br>1) Introduction to Classification <br>2) Why Random Forest? <br>3) What is Random Forest? <br>4) Random Forest Use Cases <br>5) How Random Forest Works? <br>6) Demo in R: Diabetes Prevention Use Case <br><br>You can also take a complete structured training, check out the details here: https://goo.gl/AfxwBc
E N D
Random Forest www.edureka.co/data-science Edureka’s Data Science Certification Training
What Will You Learn Today? 3 1 2 What is Random Forest? Introduction Why Random Forest? 5 6 4 Demo In R: Diabetes Prevention Use Case How Random Forest Works? Random Forest - Example www.edureka.co/data-science Edureka’s Data Science Certification Training
Introduction www.edureka.co/data-science Edureka’s Data Science Certification Training
Introduction To Classification Is this A or B ? Classification is the problem of identifying to which set of categories a new observation belongs. It is a supervised learning model as the classifier already has a set of classified examples and from these examples, the classifier learns to assign unseen new examples. Example: Assigning a given email into "spam" or "non-spam" category. www.edureka.co/data-science Edureka’s Data Science Certification Training
Types Of Classifiers Naïve Bayes Decision Tree Random Forest • Decision tree builds classification models in the form of a tree structure. • It breaks down a dataset into smaller and smaller subsets. • It is a classification technique based on Bayes' Theorem with an assumption of independence among attributes. • Random Forest is an ensemble classifier made decision tree models. • Ensemble models results from different models. using many combine the www.edureka.co/data-science Edureka’s Data Science Certification Training
Why Random Forest? www.edureka.co/data-science Edureka’s Data Science Certification Training
Use Case - Credit Risk Detection To minimize loss, the bank needs a decision rule to predict whom to give approval of the loan. An applicant’s demographic (income, debts, credit history) and socio-economic profiles are considered. Data science can help banks recognize behavior patterns and provide a complete view of individual customers. www.edureka.co/data-science Edureka’s Data Science Certification Training
Use Case - Credit Risk Detection age Credit history student Risk Bank Balance Risk No Risk Risk No Risk Final outcome www.edureka.co/data-science Edureka’s Data Science Certification Training
What is Random Forest? www.edureka.co/data-science Edureka’s Data Science Certification Training
What Is Random Forest? Random Forest - a versatile algorithm capable of performing both i) Regression ii) Classification It is a type of ensemble learning method Commonly used predictive modelling and machine learning technique www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest - Example Let’ say you want to decide if to watch “Edge of Tomorrow” or not. So you will decide based on following two actions. (i) You can ask your best friend (ii) You can ask bunch of friends. www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest - Example Ask best friend Cast - Emily Blunt To figure out if you will like “Edge of Tomorrow” or not, your friend will analyze a few things as: Genre - Adventure (i) (ii) If you like Emily Blunt If you like Adventure and Action Is Emily Blunt main lead? Yes No Thus, a decision tree is created by your best friend. Like Don’t Like Yes No Like Don’t Like www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest - Example In order to get more accurate recommendations, you will have to ask bunch of friends, say #Friend1, #Friend2, #Friend3 and consider their vote. Each one of them may take movies of different genre and further decide. The majority of the votes will decide the final outcome. Thus you build random forest of group of friends. www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest - Example Friend 1 Friend 2 Friend 3 Action movies Far and Away Top Gun Oblivion Godzilla Tom Cruise Yes No Yes No No Like Don’t Like Like Like Yes Don’t Like Like www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest Use Cases Banking Identification of loan risk applicants by their probability of defaulting payments. Medicine Banking Medicine Identification of at-risk patients and disease trends. Land Use Identification of areas of similar land use. Use-cases Marketing Identifying customer churn. Remote sensing Marketing www.edureka.co/data-science Edureka’s Data Science Certification Training
How Random Forest Works? www.edureka.co/data-science Edureka’s Data Science Certification Training
Random Forest Algorithm i.Randomly select m features from T; where ?≪T i.For node d, calculate the best split point among the ? feature i.Split the node into two daughter nodes using the best split Repeat first three steps until ? number of nodes has been reached T: number of features ?: number of trees to be constructed ?: Output: the class with the highest vote Build your forest by repeating steps i–iv for ? number of times www.edureka.co/data-science Edureka’s Data Science Certification Training
How Random Forest Works? Let’s take an example, We have taken dataset consisting of: • Weather information of last 14 days • Whether match was played or not on that particular day Now using the random forest we need to predict whether the game will happen if the weather condition is Outlook Humidity Wind Play = = = = Rain High Weak ? www.edureka.co/data-science Edureka’s Data Science Certification Training
How Random Forest Works? The first step in Random forest is that it will divide the data into smaller subsets. Every subsets need not be distinct, some subsets maybe overlapped www.edureka.co/data-science Edureka’s Data Science Certification Training
How Random Forest Works? D3,D4,D5,D6 D7,D8,D9 D1,D2,D3 Overcast Overcast Overcast Humidity Play Wind Play Play Humidity Wind Wind Play Play No Play No play Play Play No play No play Play Play www.edureka.co/data-science Edureka’s Data Science Certification Training
Features of Random Forest Most accurate learning algorithms Works well for both classification and regression problems Runs efficiently on large databases Requires almost no input preparation Performs implicit feature selection Can be easily grown in parallel Methods for balancing error in unbalanced data sets www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo www.edureka.co/data-science Edureka’s Data Science Certification Training
What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? Sure! Let me take you through the steps to predict the vulnerable patients. www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Doctor gets the following data from the medical history of the patient. Data Acquisition Divide dataset Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo We will divide our entire dataset into two subsets as: • Training dataset -> to train the model • Testing dataset -> to validate and make predictions Data Acquisition Divide dataset Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Before we create random forest, let’s find out the best mtry value using following commands Data Acquisition Divide dataset Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Here, we implement random forest in R using following commands. Data Acquisition Divide dataset Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo We get the output as follows Data Acquisition Divide dataset Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Let’s see what all variables are most important for our model. For plotting the we can use the following commands Data Acquisition Divide dataset Implement model Visualize Model Validation As per MiniDecreaseGini value, glucose_conc is the most important variable in the model. www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Data Acquisition Now, we can use our model to predict the output of our testing dataset. We can use the following code for predicting the output. Divide dataset Implement model pred1_diabet<-predict(diabet_forest,newdata = diabet_test,type ="class") Visualize pred1_diabet Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo We get the following output for our testing dataset where: Data Acquisition “YES” means the probability of patient being vulnerable to diabetes is positive Divide dataset “NO” means the probability of patient being vulnerable to diabetes is negative. Implement model Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo Data Acquisition We can create confusion matrix for the model using the library caret to know how good is our model. Divide dataset Implement model library(caret) confusionMatrix(table(pred1_diabet,diabet_test$is_diabetic)) Visualize Model Validation www.edureka.co/data-science Edureka’s Data Science Certification Training
Demo The accuracy (or the overall success rate) is a metric defining the rate at which a model has classified the records correctly. A good model should have a high accuracy score Data Acquisition Divide dataset Divide dataset Implement model Implement model Visualize Visualize Model Validation Accuracy = 79.66% www.edureka.co/data-science Edureka’s Data Science Certification Training
Course Details Get Edureka Certified in Data Science Today! Go to www.edureka.co/data-science Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support infrastructures are a top notch.” What our learners have to say about us! and other www.edureka.co/data-science Edureka’s Data Science Certification Training
www.edureka.co/data-science Edureka’s Data Science Certification Training