500 likes | 935 Views
** Data Science Certification using R: https://www.edureka.co/data-science ** <br>In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session: <br>Need for Data Science <br>Walmart Use case <br>What is Data Science? <br>Who is a Data Scientist? <br>Data Science – Skill set <br>Data Science Job roles <br>Data Life cycle <br>Introduction to Machine Learning <br>K- Means Use case <br>K- Means Algorithm <br>Hands-On <br>Data Science certification <br><br>Blog Series: http://bit.ly/data-science-blogs <br><br>Data Science Training Playlist: http://bit.ly/data-science-playlist <br><br>Follow us to never miss an update in the future. <br><br>Instagram: https://www.instagram.com/edureka_learning/ <br>Facebook: https://www.facebook.com/edurekaIN/ <br>Twitter: https://twitter.com/edurekain <br>LinkedIn: https://www.linkedin.com/company/edureka
E N D
Agenda 1. Need for Data Science 7. Data Life Cycle 2. Walmart Use Case 8. Introduction to Machine Learning 3. What is Data Science? 9. K – Means Use Case 4. Who is a Data Scientist? 10. K – Means Algorithm 5. Data Science – Skill Set 11. Hands - On 6. Data Science Job Roles 12. Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Need For Data Science DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources Evolution of Technology IOT Telephone Car Desktop Social Media Other factors Mobile Cloud Smart Car DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources Evolution of Technology IOT Social Media Other factors DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources Evolution of Technology 1,736,111 pictures 347,222 tweets 204,000,000 emails IOT Social Media Other factors 300 hours of video uploaded 4,166,667 likes & 200,000 photos 200,000 photos 4,166,667 likes & DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources Evolution of Technology IOT Social Media Other factors DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Walmart Use Case DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart Halloween and cookie sales Data scientist at Walmart found a connection between Halloween and the sales of cookies. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart Hurricane and strawberry pop tarts Data scientist at Walmart found that sales of Strawberry pop-tarts increased by 7 times before a Hurricane. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart Social media and cake pops Walmart is leveraging social media data to find about the trending products so that they can be introduced to the Walmart stores across the world DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What Is Data Science? DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What is Data Science? “Torture the data, and it will confess to anything.” ~ Ronald Coase, Economics, Nobel Prize Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods: Programming + Statistics + Business DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Who Is A Data Scientist? DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Who Is A Data Scientist? Mathematics Business Technology DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science – Skill Set Data extraction & processing Programming languages Data wrangling & exploration Statistics Big Data processing frameworks Data visualisation Machine Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Job Roles DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Job Roles Data Scientist Data Analyst Data Architect Data Engineer Database Administrator Data & Analytics Manager Statistician Business Analyst DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Life Cycle DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data Deployment acquisition Data Science Data Modelling processing Data exploration DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition Understand the problem Data Processing Identify central objectives Data exploration Identify variables that need to be predicted Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition What data do I need for my project? What are the data sources? Data Processing How can I obtain the data? Data exploration What is the most efficient way to store and access all of it? Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition Transform data into desired format Data Processing Data cleaning • Missing values • Corrupted data • Remove unnecessary data Data exploration Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition understand the patterns in the data Data Processing Retrieve useful insight Data exploration form hypotheses Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition Determine optimal data features for the machine-learning model Data Processing Create a model that predicts the target most accurately Data exploration Evaluate & test the efficiency of the model Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle Business requirements Data acquisition Check the deployment environment for dependency issues Data Processing Deploy the model in a pre- production/ test environment Data exploration Monitor the performance Modelling Deployment DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Introduction To Machine Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What Is Machine Learning? Machine learning is a subset of artificial intelligence (AI) which provides machines the ability to learn automatically & improve from experience without being explicitly programmed. Cherry Data They look the same! Apple Algorithm Orange DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Types Of Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Use Case DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Brain Tumour Detection Using K - means K-Means clustering is an unsupervised learning algorithm used to partition a dataset into k clusters in which each data point belongs to the cluster with the nearest mean. Brain tumour segmentation deals with the implementation of the k-means algorithm for detection of range and shape of tumour in brain MR images. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Randomly initialize k points called the cluster centroids. Here, k = 2 Initialization ➢Value of k(number of clusters) can be determined by the elbow curve. Cluster assignment Move centroid Optimization Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Compute the distance between the data points and the cluster centroid initialized. Initialization ➢Depending upon the minimum distance, data points are divided into two groups. Cluster assignment Move centroid 1 Optimization 2 Convergence Cluster centroid Euclidean distance DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Compute mean of red dots & reposition red cluster centroid to this mean Initialization ➢Compute mean of green dots & reposition green cluster centroid to this mean. Cluster assignment Move centroid Optimization 2 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions. Initialization Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢Finally, k-means clustering algorithm converges. Initialization ➢Divides the data points into two clusters clearly visible in red and green. Cluster assignment Move centroid 2 Optimization 1 Convergence DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm ➢ Data Matrix ➢ Distance/ dissimilarity Matrix DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Hands - On DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Edureka’s Data Science Certification DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Edureka’s Data Science Certification Data extraction, wrangling & exploration Introduction to Data Science Unsupervised Learning Classification techniques Introduction to Machine Learning Statistical Inference Recommender engine Deep Learning Time series Text Mining DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
WebDriver vs. IDE vs. RC ➢ Data Warehouse is like a relational database designed for analytical needs. ➢ It functions on the basis of OLAP (Online Analytical Processing). ➢ It is a central location where consolidated data from multiple locations (databases) are stored. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science