570 likes | 806 Views
** Machine Learning Training with Python: https://www.edureka.co/python ** <br>This Machine Learning Interview Questions and Answers PPT will help you to prepare yourself for Data Science / Machine Learning interviews. This PPT is ideal for both beginners as well as professionals who want to learn or brush up their concepts in Machine Learning core-concepts, Machine Learning using Python and Machine Learning Scenarios. Below are the topics covered in this tutorial: <br><br>1. Machine Learning Core Interview Question <br>2. Machine Learning using Python Interview Question <br>3. Machine Learning Scenario based Interview Question <br><br>Check out our playlist for more videos: http://bit.ly/2taym8X <br><br>Follow us to never miss an update in the future. <br><br>Instagram: https://www.instagram.com/edureka_learning/ <br>Facebook: https://www.facebook.com/edurekaIN/ <br>Twitter: https://twitter.com/edurekain <br>LinkedIn: https://www.linkedin.com/company/edureka
E N D
Agenda for Today’s Session ▪ Machine Learning Core Interview Question ▪ Machine Learning using Python Interview Question ▪ Machine Learning scenario based Interview Question Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Leaning Training Using Python Core Machine Learning Questions Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Learning Interview Question 1 Q: How will you explain Machine Learning to a school going kid? ▪ Suppose your friend invites you to his party where you meet totally strangers ▪ Since you have no idea about them you will classify them on the basis of gender, age group, dressing, or whatever way you would like using unsupervised learning (no prior knowledge) ▪ How is this learning different from Supervised Learning? Since you didn't use any past/prior knowledge about people and classified them "on-the-go" Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 2 Q: What are various types of Machine Learning? Supervised Learning ▪ Is like learning with a teacher ▪ Training dataset is like a teacher which is used to train the machine ▪ Model is “trained” on a pre-defined dataset before it starts making decisions when given new data Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 2 Q: What are various types of Machine Learning? Unsupervised Learning ▪ Is like learning without a teacher ▪ Model learns through observation & find structures in data ▪ Model is given a dataset, and are left to automatically find patterns and relationships in that dataset by creating clusters Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 2 Q: What are various types of Machine Learning? Reinforcement Learning ▪ Model learns with hit and trail method ▪ Learns on the basis of reward or penalty given for every action it performs Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 3 Q: What’s your favourite algorithm, and can you explain it to me in less than a minute? This type of question is just to tests your understanding of how you communicate complex and technical nuances with ease and also this question will judge the ability to summarize quickly and efficiently. Make sure you have a choice and make sure you can explain different algorithms so simply and effectively that even a five-year-old could grasp the basics! Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 4 Q: How deep learning differs from machine learning? Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 4 Q: How deep learning differs from machine learning? Deep Learning is a form of machine Machine Learning is all about learning that is inspired by the structure algorithms that parse data, learn from of the human brain and is particularly that data, and then apply what they’ve effective in feature detection learned to make informed decisions Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 5 Q: Explain Classification and Regression Regression and classification are both related to prediction, where regression predicts a value from a continuous set, and classification predicts the 'belonging' to the class. Classification (Class Labels) Regression (Continuous Values) ▪ Very Cheap ▪ Size ▪ Cheap ▪ Area ▪ Affordable ▪ Location ▪ Costly ▪ Very Costly Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 6 Q: What do you understand by selection bias? ▪ Statistical error that causes a bias in the sampling portion of an experiment ▪ The error causes one sampling group to be selected more often than other groups included in the experiment ▪ Selection bias may produce an inaccurate conclusion if the selection bias is not identified Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 7 Q: What is a Confusion Matrix? A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 8 Q: What do you understand by Precision and Recall? Let me explain you this with an analogy: ▪ Imagine that, your girlfriend gave you a birthday surprise every year since last 10 years ▪ One day, your girlfriend asks you: ‘Sweetie, do you remember all birthday surprises from me?’ Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 8 Q: What do you understand by Precision and Recall? ▪ To extend your life, you need to recall all 10 surprising events from your memory ▪ So, recall is the ratio of a number of events you can correctly recall to a number of all correct events ▪ If you can recall all 10 events correctly, then, your recall ratio is 1.0 (100%) ▪ If you can recall 7 events correctly, your recall ratio is 0.7 (70%) Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 8 Q: What do you understand by Precision and Recall? However, you might be wrong in some answers. ▪ For example, you answers 15 times, 10 events are correct and 5 events are wrong. This means you can recall all events but it’s not so precise ▪ So, precision is the ratio of a number of events you can correctly recall to a number all events you recall (mix of correct and wrong recalls). In other words, it is how precise of your recall ▪ From the last example (10 real events, 15 answers: 10 correct, 5 wrong), you get 100% recall but your precision is only 66.67% (10 / 15) Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 8 Q: What do you understand by Precision and Recall? ▪ A number of events you can correctly recall = True positive (they’re correct and you recall them) ▪ A number of all correct events = True positive (they’re correct and you recall them) + False negative (they’re correct but you don’t recall them) ▪ A number of all events you recall = True positive (they’re correct and you recall them) + False positive (they’re not correct but you recall them) ▪ recall = True positive / (True positive + False negative) ▪ precision = True positive / (True positive + False positive) Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 9 Q: Explain false negative, false positive, true negative and true positive with a simple example ▪ True Positive: If the alarm goes on in case of a fire ▪ Fire is positive and prediction made by the system is true ▪ False Positive: If alarm goes on , and there is no fire ▪ System predicted fire to be positive which is a wrong prediction, hence the prediction is false ▪ False Negative: if alarm does not go on but there was a fire , ▪ System predicted fire to be negative which was false since there was fire. ▪ True Negative: if alarm does not go on and there was no fire, ▪ The fire is negative and this prediction was true Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 10 Q: What is the difference between inductive and deductive learning? Inductive learning = observation → conclusion Deductive learning = conclusion → observation Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 11 Q: How is KNN different from k-means clustering? K-Nearest Neighbors K-Means Clustering ▪ Supervised Technique ▪ Unsupervised Technique ▪ Used for Classification or Regression ▪ Used for Clustering ▪ used for classification and regression of known ▪ used for scenarios like understanding the data where usually the target attribute/variable population demographics, social media trends, is known before hand. anomaly detection, etc. ▪ KNN needs labelled points ▪ K-Means doesn’t requires labelled points Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 12 Q: Can you explain me What is ROC curve and what does it represent? Receiver Operating Characteristic curve (or ROC curve) is a fundamental tool for diagnostic test evaluation and is a plot of the true positive rate (Sensitivity) against the false positive rate (Specificity) for the different possible cut-off points of a diagnostic test Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 12 Q: Can you explain me What is ROC curve and what does it represent? ▪ It shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity). ▪ The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. ▪ The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test. ▪ The slope of the tangent line at a cutpoint gives the likelihood ratio (LR) for that value of the test. ▪ The area under the curve is a measure of text accuracy Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 13 Q: What’s the difference between Type I and Type II error? ▪ Type I error is a false positive, while Type II error is a false negative ▪ Type I error is claiming something has happened when it hasn’t, while Type II error is claiming nothing when in fact something has happened Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 14 Q: Is it better to have too many false positives, or too many false negatives? Explain. It depends on the question as well as on the domain for which we are trying to solve the question. • In medical testing, false negatives may provide a falsely reassuring message to patients and physicians that disease is absent, when it is actually present. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. So, it is desired to have too many false positive • For spam filtering, a false positive occurs when spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task. So, we prefer too many false negatives over many false positives Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 15 Q: Which is more important to you – model accuracy, or model performance? Well you must know that model accuracy is only a subset of model performance For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 16 Q: What is the difference between Gini Impurity and Entropy in a Decision Tree? ▪ These two are the metrics for deciding how to split a tree ▪ Gini measurement is the probability of a random sample being classified correctly if we randomly pick a label according to the distribution in a branch ▪ Entropy is a measurement of information (or rather lack thereof). You calculate the information gain by making a split. Which is the difference in entropies. This measures how you reduce the uncertainty about the label Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 17 Q: What is the difference between Entropy and Information Gain? ▪ Entropy is an indicator of how messy your data is. It keeps on decreasing as you reach closer to the leaf node. ▪ The information gain is based on the decrease in entropy after a dataset is split on an attribute. It keeps on increasing as you reach closer to the leaf node. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 18 Q: What is Overfitting? And how do you ensure you’re not overfitting with a model? Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 18 Q: What is Overfitting? And how do you ensure you’re not overfitting with a model? Let’s say we want to predict if a student will land a job interview based on her resume. Now, assume we train a model from a dataset of 10,000 resumes and their outcomes. Next, we try the model out on the original dataset, and it predicts outcomes with 99% accuracy… wow! But now comes the bad news. When we run the model on a new (“unseen”) dataset of resumes, we only get 50% accuracy! Our model doesn’t generalize well from our training data to unseen data. This is known as overfitting, and it’s a common problem in machine learning and data science Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 18 Q: What is Overfitting? And how do you ensure you’re not overfitting with a model? Three main methods to avoid overfitting: ▪ Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data ▪ Use cross-validation techniques such as k-folds cross-validation ▪ Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 19 Q: Explain ensemble learning technique in Machine Learning Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 20 Q: What is Bagging and boosting in machine learning? Similarities Difference ▪ While they are built independently for Bagging, Boosting tries to ▪ Both are ensemble methods to get N learns from 1 learner add new models that do well where previous models fall. ▪ Only Boosting determines weight for the data to tip the scales in ▪ Both generate several training data sets by random sampling favour of the most difficult cases ▪ Is an equally average for Bagging and a weighted average for ▪ Both make the final decision by taking the average of N learners Boosting more weight in those with better performance on training data ▪ Only Boosting tries to reduce bias. On the other hand, Bagging may ▪ Both are good at reducing variance and proving higher scalability solve the problem of over-fitting, while boosting can increase it Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 21 Q: How would you screen for outliers and what should you do if you find one? ▪ One way of detecting outliers is if the mean/average of a data set is significantly different from the median of the data set ▪ For example, if you have a data set, ▪ 10,20,34, 45, 50,60,93 the median of the data set is 45 while the mean/average = 312/7 = 44.5 ~ 45 ▪ If now, we add -200 to this set, the median is 45+34/2 = 39.5~40 whereas the mean = 112/8 = 14 ▪ You can try the opposite by adding, say 400 to our original data set. In that case, median will still be 40, while the mean = 712/8=89 ▪ In datasets that are normally distributed, mean and median will be very close as in our first example Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 22 Q: What is collinearity and multicollinearity? ▪ Collinearity occurs when two predictor variables (e.g., x1and x2) in a multiple regression have a non-zero correlation ▪ Multicollinearity occurs when more than two predictor variables (e.g., x1, x2and x3) are inter-correlated. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 23 Q: What do you understand by Eigenvectors and Eigenvalues? ▪ Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix ▪ Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching ▪ Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 24 Q: What is A/B Testing? . ▪ It is a statistical hypothesis testing for randomized experiment with two variables A and B ▪ The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest ▪ An example for this could be identifying the click-through rate for a banner ad Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 25 Q: What is Cluster Sampling? . • It is a process of randomly selecting intact groups within the defined population, sharing similar characteristics • Cluster Sample is a probability sample where each sampling unit is a collection or cluster of elements For eg: If managers(sample) are Elements then Companies are clusters Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 26 Q: What is Collaborative Filtering? . The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 26 Q: Why is naive Bayes so ‘naive’ ? . In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. Basically, it's "naive" because it makes assumptions that may or may not turn out to be correct. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Leaning Training Using Python Machine Learning with Python Questions Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Learning Interview Question 1 Q: Name a few libraries in Python used for Data Analysis and Scientific computations ▪ NumPy ▪ Matplotlib ▪ SciPy ▪ Seaborn ▪ Pandas ▪ Bokeh ▪ SciKit Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 2 Q: Which library would you prefer for plotting in Python language: Seaborn or Matplotlib or Bokeh? Matplotlib: Used for basic plotting like bars, pies, lines, scatter plots, etc Seaborn: is built on top of Matplotlib and Pandas to ease data plotting. It is used for statistical visualizations like creating heatmaps or showing the distribution of your data Bokeh: Used for interactive visualization. In case your data is too complex and you haven't found any "message" in the data, then use Bokeh to create interactive visualizations that will allow your viewers to explore the data themselves Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 3 Q: How are NumPy and SciPy related? ▪ NumPy is part of SciPy ▪ Defines arrays along with some basic numerical functions like indexing, sorting, reshaping, etc. ▪ SciPy implements stuff like numerical integration and optimization and machine learning using NumPy's functionality. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 4 Q: What is the main difference between a Pandas series and a single-column DataFrame in Python? Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 5 Q: How can you handle duplicate values in a dataset for a variable in Python? bill_data=pd.read_csv("datasets\\Telecom Data Analysis\\Bill.csv") bill_data.shape #Identify duplicates records in the data Dupes = bill_data.duplicated() sum(dupes) #Removing Duplicates bill_data_uniq = bill_data.drop_duplicates() Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 6 Q: Write a basic machine learning program to check the accuracy of the dataset importing any dataset using any classifier? #importing dataset #Selecting Classifier import sklearn my_classifier = tree.DecisionTreeClassifier() from sklearn import datasets My_classifier.fit(X_train, Y_train) iris = datasets.load_iris() Predictions = my_classifier(X_test) X = iris.data #check accuracy Y = iris.target From sklear.metrics importaccuracy_score #splitting the dataset print accuracy_score(y_test) from sklearn.cross_validation import train_test_split X_train, Y_train, X_test, Y_test = train_test_split(X,Y, test_size = 0.5) Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Leaning Training Using Python Scenario Based Questions Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Learning Interview Question 1 Q: You are given a data set consisting of variables having more than 30% missing values? Let’s say, out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them? ▪ Assign a unique category to missing values, who knows the missing values might decipher some trend ▪ We can remove them blatantly. ▪ Or, we can sensibly check their distribution with the target variable, and if found any pattern we’ll keep those missing values and assign them a new category while removing others Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 2 Q: Write an SQL query that makes recommendations using the pages that your friends liked. Assume you have two tables: a two-column table of users and their friends, and a two-column table of users and the pages they liked. It should not recommend pages you already like. SELECT f.user_id, l.page_id FROM friend f JOIN like l ON f.friend_id = l.user_id WHERE l.page_id NOT IN (SELECT page_id FROM like WHERE user_id = f.user_id) Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions
Machine Learning Interview Question 3 Q: There's a game where you are asked to roll two fair six-sided dice. If the sum of the values on the dice equals seven, then you win $21. However, you must pay $5 to play each time you roll both dice. Do you play this game? And in follow-up: what is the probability of making money from this game? Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Interview Questions