160 likes | 311 Views
( Apache Spark Training: https://www.edureka.co/apache-spark-scala-training ) <br>( Hadoop Training: https://www.edureka.co/hadoop ) <br>This Edureka Hadoop vs Spark video will help you to understand the differences between Hadoop and Spark. We will be comparing them on various parameters. We will be taking a broader look at: <br><br>1. Introduction to Hadoop <br>2. Introduction to Apache Spark <br>3. Spark vs Hadoop - <br>Performance <br>Ease of Use <br>Cost <br>Data Processing <br>Fault tolerance <br>Security <br>4. Hadoop Use-cases <br>5. Spark Use-cases
E N D
Hadoop vs Spark Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Hadoop Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Spark Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Hadoop vs Spark Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Performance Performance Step Step Step Step Ease of Use Move data through disk & network Costs Data Processing Step Step Step Step Fault Tolerance Caches data in memory Security Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Ease of Use Performance Ease of Use Costs Hadoop can be integrated with multiple tools like Sqoop, Flume, Pig, Hive Data Processing Fault Tolerance Security Spark comes with user-friendly APIs for Scala, Java, Python, and Spark SQL Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Costs Performance Ease of Use Costs Hadoop requires lot of disk space as well as faster disks Data Processing Fault Tolerance Security Spark requires large amounts of RAM for executing everything in memory Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Data Processing Time Performance ETL Stored Stored Stored Batch Processing Ease of Use Client Client Client Costs Client Time Data Processing ms ms ms Stream Processing Fault Tolerance Client Client Client Security Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Fault Tolerance 1 2 Performance Ease of Use Replication Re-Execution of Job Costs Data Processing Fault Tolerance Security RDD is automatically recomputed by using the original transformations Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Security Performance Server LDAP 1. Login Attempt 2. Password Ease of Use 4. Logged In 3. Check OK Costs Data Processing Fault Tolerance Security Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Hadoop Use-cases Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Hadoop Use-cases Reporting Tool 1 1 Reporting Tool 2 Reporting Tool 3 Archival Data Reporting Tools Server 2 Time ETL Stored Stored Stored Applications requiring Batch Processing Client Client Client Client Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Spark Use-cases Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Spark Use-cases 3 1 Graph Processing Time ms ms ms 2 Client Client Client Applications requiring Stream-processing Iterative Processing Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Which one is the best? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Copyright © 2017, edureka and/or its affiliates. All rights reserved.