90 likes | 114 Views
Some special skills always need to know about Big Data and hadoop which more advanced technology in today's era.
E N D
Top 7 things a Hadoop Developer should know!! Technogeeks Pune Come to Lean Go to lead……
Applications of tools • Hadoop Developer must have idea about core tools in Hadoop Ecosystem like : • HDFS (Hadoop Distributed File System) • YARN and MapReduce (Processing Frameworks) • ETL Tool : Pig • Data Warehouse : Hive • Data Ingestion : SQOOP , Flume , Kafka , nifi, Kylo • Jobs Scheduler : oozie
Hadoop and Spark Integration • Reason why Hadoop is slow? • Why We need Spark with Hadoop? • Spark and Hadoop integration • Spark with Scala / Python / Java • Major changes in Spark 2.x • Spark Core , Spark SQL, Spark with Hive , Spark Streaming • Spark library and Scala library Integration
Scripts Automation • Hadoop Scripts Automation • Need of Automation • Flexibility • Automation tools • Basics of Shell Script • Types of Shell
Customization of Tools • UDF Implementation in Pig, hive, Spark, etc. • Need of customization • Languages which help in customization • Usage of GitHub and existing code on git for customization • Existing UDF and runnable jars on maven repository
Exception Handling and Logging • Exception handling importance • Exception handling code integration with Hadoop scripts • How to effectively log errors in log files • Standards which should be followed while implementing exception handler • Script validation at the time of execution • Path validation at the time of execution • Data validation at the time of execution
Performance Optimization • Performance of the tools matters a lot in Hadoop because of BigData • Points needs to take care while implementing script related to : • Tool • Resources • Bandwidth consumption • Dependencies • Scalability
Scalability • Code must be scalable • Scripts should perform better when data volume increases • Scripts must be written with naming and code standards • Work well when there is more data traffic also • Memory management • Session variable management • Cache and state management
Thanks for watching….. For more details on • Hadoop Bigdata • Data Science , Python , R Language, Statistics • Machine Learning, Deep Learning , Data Visualization • Amazon Web Services (AWS | Cloud Computing) • Automation Software Testing – Selenium • ETL Testing Contact us at +91-860-099-8107 | contact@technogeekscs.co.in Or visit us at: www.technogeekscs.com