160 likes | 259 Views
** PySpark Certification Training: https://www.edureka.co/pyspark-certification-training ** <br><br>This Edureka tutorial on "PySpark RDD"" will provide you with a detailed and comprehensive knowledge of RDD, which are considered the backbone of Apache Spark. You will learn about the various Transformations and actions that can be performed on RDDs. This tutorial covers the following topics: <br><br>1. Need for RDDs <br>2. What are RDDs <br>3. PySpark RDD features <br>4. PySpark RDD Operations <br>5. Finding Page Rank - PySpark Demo
E N D
PySpark RDD PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Today’s Training Topics Today’s Training Topics ❖ Need for RDD ❖ What are RDD? ❖ PySpark RDD Operations ❖ Features of RDD ❖ PySpark RDD Operations - Demo ❖ Finding Page Rank – PySpark RDD Demo PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why RDD? Why RDD? Iterative Process S L O W Reusing Data Sharing Data JOB #1 JOB #2 Stable Storage (HDFS) Stable Storage (HDFS) Stable Storage (HDFS) PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
What are RDD? What are RDD? Backbone of Apache Spark One of the First Fundamental Data Structures R D D esilient Transformations istributed Actions ataset PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Transformations and Actions Transformations and Actions Transformations Actions map collect flatMap collectAsMap filter reduce distinct countByKey reduceByKey take mapPartitions countByValue PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Features of RDDs Features of RDDs Lazy Persistence Immutability Evaluations Coarse Grained Operations In-Memory Computation Fault Tolerance Partitioning PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
RDD Creation & Operations PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Transformations Actions map collect flatMap groupBy filter reduce distinct groupByKey reduceByKey take mapPartitions countByValue PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Finding Page Rank PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ? PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ? Page Rank of Site PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ? Page Rank of Inbound Link Page Rank of Site PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ? Page Rank of Inbound Link Page Rank of Site Number of Links On that Page PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ? Iter - 0 Iter - 1 Iter - 2 Rank Netflix 1/4 1/12 1.5/12 1 Amazon 1/4 2/12 2.5/12 2 1/4 3 Wikipedia 4.5/12 4.5/12 Google 4/12 4 4/12 1/4 PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Problem Statement PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training