130 likes | 144 Views
https://www.learntek.org/big-data-and-hadoop-training/<br><br>Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
E N D
Big Data Hadoop Training What is Hadoop? Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Why Hadoop? Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration.
Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically. Flexibility: Unlike traditional rdecideelational database, you don’t have to process data before storing it, You can store as much data as you want and how to use it later. That includes unstructured data like text, images and videos etc. Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data. Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required. The following topics will be covered in our Big Data and Hadoop Online Training Copyright @ 2018 Learntek. All Rights Reserved.
Big Data Hadoop Training Topics : Hadoop Introduction: Big Data Hadoop Training : Introduction to Data and System Types of Data Traditional way of dealing large data and its problems Types of Systems & Scaling What is Big Data Challenges in Big Data Challenges in Traditional Application New Requirements What is Hadoop? Why Hadoop? Brief history of Hadoop Features of Hadoop Hadoop and RDBMS Hadoop Ecosystem’s overview Copyright @ 2018 Learntek. All Rights Reserved.
Hadoop Installation : Installation in detail Creating Ubuntu image in VMware Downloading Hadoop Installing SSH Configuring Hadoop, HDFS & MapReduce Download, Installation & Configuration Hive Download, Installation & Configuration Pig Download, Installation & Configuration Sqoop Download, Installation & Configuration Hive Configuring Hadoop in Different Modes Copyright @ 2018 Learntek. All Rights Reserved.
Hadoop Distribute File System (HDFS) : File System – Concepts Blocks Replication Factor Version File Safe mode Namespace IDs Purpose of Name Node Purpose of Data Node Purpose of Secondary Name Node Purpose of Job Tracker Purpose of Task Tracker HDFS Shell Commands – copy, delete, create directories etc. Reading and Writing in HDFS Difference of Unix Commands and HDFS commands Read / Write in HDFS – Internal Process between Client, Name Node & Data Nodes. Accessing HDFS using Java API Various Ways of Accessing HDFS Understanding HDFS Java classes and methods Admin: 1. Commissioning / Decommissioning Data Node Balancer Replication Policy Network Distance / Topology Script Copyright @ 2018 Learntek. All Rights Reserved.
Map Reduce Programming : About MapReduce Understanding block and input splits MapReduce Data types Understanding Writable Data Flow in MapReduce Application Understanding MapReduce problem on datasets MapReduce and Functional Programming Writing MapReduce Application Understanding Mapper function Understanding Reducer Function Understanding Driver Usage of Combiner Understanding Partitioned Usage of Distributed Cache Passing the parameters to mapper and reducer Analyzing the Results Log files Input Formats and Output Formats Counters, Skipping Bad and unwanted Records Writing Join’s in MapReduce with 2 Input files. Join Types. Execute MapReduce Job – Insights. Exercise’s on MapReduce. Job Scheduling: Type of Schedulers. Copyright @ 2018 Learntek. All Rights Reserved.
Hive Hive concepts Schema on Read VS Schema on Write Hive architecture Install and configure hive on cluster Meta Store – Purpose & Type of Configurations Different type of tables in Hive Buckets Partitions Joins in hive Hive Query Language Hive Data Types Data Loading into Hive Tables Hive Query Execution Hive library functions Hive UDF Hive Limitations Pig Pig basics Install and configure PIG on a cluster PIG Library functions Pig Vs Hive Write sample Pig Latin scripts Modes of running PIG Running in Grunt shell Running as Java program PIG UDFs Copyright @ 2018 Learntek. All Rights Reserved.
Sqoop : Install and configure Sqoop on cluster Connecting to RDBMS Installing MySQL Import data from MySQL to hive Export data to MySQL Internal mechanism of import/export HBase : HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using Java API Copyright @ 2018 Learntek. All Rights Reserved.
Oozie : Introduction to OOZIE Oozie architecture XML file specifications Specifying Work flow Control nodes Oozie job coordinator Flume Introduction to Flume Configuration and Setup Flume Sink with example Channel Flume Source with example Complex flume architecture Copyright @ 2018 Learntek. All Rights Reserved.
Zookeeper : Introduction to Zookeeper Challenges in distributed Applications Coordination ZooKeeper : Design Goals Data Model and Hierarchical namespace Client APIs YARN Hadoop 1.0 Limitations MapReduce Limitations History of Hadoop 2.0 HDFS 2: Architecture HDFS 2: Quorum based storage HDFS 2: High availability HDFS 2: Federation YARN Architecture Classic vs YARN YARN Apps YARN multitenancy YARN Capacity Scheduler Prerequisites : Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful. Copyright @ 2018 Learntek. All Rights Reserved.