1 / 41

Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certification | Edureka

This Edureka "Apache Spark Training" tutorial will talk about how Apache Spark works practically. We have demonstrated a Movie Recommendation Project using Apache Spark in this tutorial. Below are the topics covered in this tutorial: <br><br>1) Use Cases Of Real Time Analytics <br>2) Movie Recommendation System Using Spark <br>3) What Is Spark? <br>4) Getting Movie Dataset <br>5) Spark Streaming <br>6) Collaborative Filtering <br>7) Spark MLlib <br>8) Fetching Results <br>9) Storing Results

EdurekaIN
Download Presentation

Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certification | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  2. What to expect?  Use Cases Of Real Time Analytics  Movie Recommendation System Using Spark  What Is Spark?  Getting Movie Dataset  Spark Streaming  Collaborative Filtering  Spark MLlib  Fetching Results  Storing Results EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  3. Use Cases of Real Time Analytics EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  4. Use Cases of Real Time Analytics  Government agencies perform Real Time Analysis mostly in the field of national security.  Countries need to continuously keep a track of all the military and police agencies for updates regarding threats to security. Government EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  5. Use Cases of Real Time Analytics  Healthcare domain uses Real Time analysis to continuously check the medical status of critical patients.  Hospitals on the look out for blood and organ transplants need to stay in a real-time contact with each other during emergencies.  Getting medical attention on time is a matter of life and death for patients. Healthcare EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  6. Use Cases of Real Time Analytics  Companies revolving around services in the form of calls, video chats and streaming use real time analysis to reduce customer churn and stay ahead of competition.  They also extract measurements of jitter and delay in mobile networks to improve customer experiences. Telecommunications EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  7. Use Cases of Real Time Analytics  Banking transacts with almost all of the world’s money.  It becomes very important to ensure fault tolerant transactions across the whole system.  Fraud detection is made possible through real time analytics in banking. Banking EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  8. Use Cases of Real Time Analytics  Stock brokers use real time analytics to predict movement of stock portfolios.  Companies re-think their business model after using real time analytics to analyze the market demand for their brand. Stock Market EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  9. Movie Recommendation System EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  10. Movie Recommendation System EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  11. Movie Recommendation System Problem Statement To build a Movie Recommendation System which recommends movies based on a user’s preferences using Apache Spark. Our Requirements: Process huge amount of data Input from multiple sources Easy to use Fast processing Apache Spark is the perfect tool to implement our Movie Recommendation System. EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  12. Use Case – Flow Diagram 4 1 2 3 Data from Streaming / HDFS Machine Learning Using MLlib Huge amount of Movie Rating data Getting Input using Spark Streaming Train the data Evaluate ALS 6 5 Storing Results in RDBMS System for Websites Fetching Results using Spark SQL Generate Recommendations EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  13. What is Spark? EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  14. What is Spark?  Apache Spark is an open-source cluster-computing framework for real time processing developed by the Apache Software Foundation.  Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Figure: Real Time Processing In Spark  It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations. Serial Parallel Reduction in time Figure: Support for multiple source formats Figure: Lazy Evaluation Figure: Data Parallelism In Spark EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  15. Spark Features Simple programming layer provides powerful caching and disk persistence capabilities 100x faster than for large scale data processing vs Powerful Caching Speed Can be deployed through Mesos, Hadoop via Yarn, or Spark’s own cluster manger Can be programmed in Scala, Java, Python and R Features Polyglot Deployment EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  16. Movie Dataset EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  17. Movie Dataset User Ratings from BookMyShow EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  18. Movie Dataset Movie Ratings In Our Dataset EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  19. Getting Dataset EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  20. Getting Dataset  For our Movie Recommendation System, we can get user ratings from many popular websites like IMDB, Rotten Tomatoes and Times Movie Ratings.  This dataset is available in many formats such as CSV files, text files and databases.  We can either stream the data live from the websites or download and store them in our local file system or HDFS. Figure: Various File Formats EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  21. Spark Streaming EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  22. Spark Streaming EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  23. Spark Streaming  Spark Streaming is used for processing real-time streaming data  Spark Streaming enables high-throughput and fault-tolerant stream processing of live data streams EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  24. Collaborative Filtering EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  25. Collaborative Filtering  We will use Collaborative Filtering (CF) to predict the ratings for users for particular movies based on their ratings for other movies.  We then collaborate this with other users’ rating for that particular movie. Movie Alice 4 5 5 4 4 Bob 3 4 3 3 4 Carol 5 4 4 ? Dave 1 2 ? Shutter Island Fight Club Dark Knight 21 Home Alone 5 5 5 Figure: Predicting the rating of Dave for Dark Knight and Carol for 21 using Collaborative Filtering EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  26. Spark MLlib EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  27. Spark MLlib  Spark MLlib is used to perform machine learning in Apache Spark.  Machine learning in Spark is implemented using Spark’s MLlib.  MLlib stands for Machine Learning Library. ML Algorithms Train Data using Alternating Least Squares (ALS) Machine Learning Using Spark MLlib Featurization Machine Learning Tools Pipelines Persistence Generate Recommendations using Collaborative Filtering Utilities Figure: Machine Learning Flow Diagram Figure: Machine Learning Tools EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  28. Fetching Results EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  29. Spark SQL for Fetching Results Machine Learning Output  To get the results from our Machine Learning, we need to use Spark SQL’s DataFrame, Dataset and SQL Service.  The results in Machine Learning needs to be stored in a RDBMS so that our web application can display the recommendations to a particular use. Spark SQL Results EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  30. Ratings for Movies Ratings of Movies for User 77 Figure: User 77’s ratings for different movies EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  31. Recommended Movies Total Number of Recommendations for User 77 Top Movie Recommendations for User 77 Figure: Movies recommended for User 77 EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  32. Storing Results EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  33. Storing Results  The results for our Movie Recommendation System can be stored either locally or into external storage systems.  We can store the Recommended Movies along with the Ratings in a text file or a CSV file.  We should prefer storing the results into an RDBMS system so that we can access it directly from our web application and display recommendations and top movies. EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  34. Spark Job Trends EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  35. Spark Job Trends  The following is the Job Trend of Apache Spark across the world.  Spark has almost thrice the average number of jobs in comparison to its competitors and is the market leader from 2014. Source: www.indeed.com EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  36. Summary EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  37. Summary What is Spark? Real Time Analytics Movie Recommendation System Spark Job Trends Spark Streaming Spark MLlib EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  38. Conclusion EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  39. Conclusion Congrats! We have hence demonstrated the power of Apache Spark in Real-Time Analysis. The hands-on examples will give you the required confidence to work on any future projects you encounter in Apache Spark. EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

  40. Thank You … Questions/Queries/Feedback EDUREKA SPARK CERTIFICATION TRAINING www.edureka.co/apache-spark-scala-training

More Related