1 / 17

PySpark RDD Tutorial | PySpark Tutorial for Beginners | PySpark Online Training | Edureka

** PySpark Certification Training: https://www.edureka.co/pyspark-certification-training ** <br><br>This Edureka tutorial on "PySpark RDD"" will provide you with a detailed and comprehensive knowledge of RDD, which are considered the backbone of Apache Spark. You will learn about the various Transformations and actions that can be performed on RDDs. This tutorial covers the following topics: <br><br>1. Need for RDDs <br>2. What are RDDs <br>3. PySpark RDD features <br>4. PySpark RDD Operations <br>5. Finding Page Rank - PySpark Demo

EdurekaIN
Download Presentation

PySpark RDD Tutorial | PySpark Tutorial for Beginners | PySpark Online Training | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PySpark RDD PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  2. Today’s Training Topics Today’s Training Topics ❖ Need for RDD ❖ What are RDD? ❖ PySpark RDD Operations ❖ Features of RDD ❖ PySpark RDD Operations - Demo ❖ Finding Page Rank – PySpark RDD Demo PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  3. Why RDD? Why RDD? Iterative Process S L O W Reusing Data Sharing Data JOB #1 JOB #2 Stable Storage (HDFS) Stable Storage (HDFS) Stable Storage (HDFS) PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  4. What are RDD? What are RDD? Backbone of Apache Spark One of the First Fundamental Data Structures R D D esilient Transformations istributed Actions ataset PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  5. Transformations and Actions Transformations and Actions Transformations Actions map collect flatMap collectAsMap filter reduce distinct countByKey reduceByKey take mapPartitions countByValue PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  6. Features of RDDs Features of RDDs Lazy Persistence Immutability Evaluations Coarse Grained Operations In-Memory Computation Fault Tolerance Partitioning PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  7. RDD Creation & Operations PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  8. Transformations Actions map collect flatMap groupBy filter reduce distinct groupByKey reduceByKey take mapPartitions countByValue PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  9. Finding Page Rank PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  10. How it works ? PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  11. How it works ? Page Rank of Site PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  12. How it works ? Page Rank of Inbound Link Page Rank of Site PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  13. How it works ? Page Rank of Inbound Link Page Rank of Site Number of Links On that Page PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  14. How it works ? Iter - 0 Iter - 1 Iter - 2 Rank Netflix 1/4 1/12 1.5/12 1 Amazon 1/4 2/12 2.5/12 2 1/4 3 Wikipedia 4.5/12 4.5/12 Google 4/12 4 4/12 1/4 PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

  15. Problem Statement PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training

More Related