310 likes | 671 Views
( Talend Training: https://www.edureka.co/talend-for-big-data ) <br>This Edureka video on Talend Big Data Tutorial will help you in understanding the basic concepts of Talend and getting familiar with the Talend Open Studio for Big Data which is an open source software provided by Talend to easily communicate with the Big Data technologies like HDFS, Hive, Pig etc. <br><br>This video helps you to learn following topics: <br>1. Big Data <br>2. Talend With Big Data <br>3. TOS For Big Data <br>4. TOS Installation <br>5. Big Data Components In Talend <br>6. First Job In Talend
E N D
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Agenda Big Data 1 Talend With Big Data 2 TOS For Big Data 3 TOS Installation 4 Big Data Components In Talend 5 First Job In Talend 6 Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Big Data refers to the voluminous and complex datasets generated on a daily basis These datasets can be structured, semi-structured or unstructured in nature These data generally is useless & holds no meaning unless mined or analyzed properly Storing & processing this data is really difficult, especially for the traditional data processing software Copyright © 2017, edureka and/or its affiliates. All rights reserved.
5 V’s Of Big Data Big Data can be characterized by 5 V’s Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Technologies – Apache Hadoop Hadoop is a framework that allows you to store and process large data sets in a parallel and distributed fashion It is a data processing framework of Hadoop and mainly has two functions (mappers & reducers) which run as tasks on various nodes in a cluster It is the file management system of Hadoop platform which is used to store data across multiple servers in a cluster Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HDFS Architecture Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Yarn Resource Manager ✓ Receives the processing requests ✓ Passes the parts of requests to corresponding Node Mangers Node Manager ✓ Installed on every DataNode ✓ Responsible for execution of task on every single DataNode Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Technologies – Apache Hive Apache Hive is a data warehouse system built on top of Hadoop It is used for analysing structured and semi-structured data Hive lets you perform queries similar to SQL through Hive Query Language (HQL) Apache Hive supports Data Definition Language (DDL), Data Manipulation Language (DML) and User Defined Functions (UDF) Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Technologies – Apache Pig Apache Pig is an abstraction layer in HDFS and was introduced by Yahoo It is a platform for analysing large datasets stored in a Hadoop distributed processing cluster It helps in performing mass-scale parallel computation on big data with lesser lines of code Two main components are: Pig Latin Language and Pig Compiler Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Talend With Big Data Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Talend With Big Data AUTOMATED 01 02 EASY 04 03 AFFORDABLE FAST Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Talend Open Studio For Big Data Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS For Big Data It is an open source software and provides an easy to use graphical development environment to the users Talend Open Studio (TOS) for big data is built on the top of Talend’s data integration solutions TOS for Big Data, at the back end, will automatically generate the underlying code in Java It is a powerful tool which leverages the Apache Hadoop Big Data platform & helps users to access, transform, move & synchronize it Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Advantages Of TOS For Big Data Better Efficient Management Collaboration Faster Designing Early Cleansing Easy Scalability Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 1. Go to https://www.talend.com/download 2. Click on Download Free Tool Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 3. If the download doesn't start, click on 'Restart download' Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 4. Once you have downloaded the zip file, extract it 5. Now go into the extracted folder and double click on TOS_BD-linux-gtk-x86_64 file to start the installation Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 6. Let the installation finish 7. Select “Create a new project ” and specify a name to it 8. Click on finish to build a project Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 9. Once the project is built following window will open. Close the welcome tab to open up the workspace Copyright © 2017, edureka and/or its affiliates. All rights reserved.
TOS Installation 10. Now you should be able to see the TOS main page Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Components In Talend Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Family Talend Open Studio leverages the power of Big Data technologies Talend provides a wide range of built-in components of Big Data Using these components you can connect to the modules of the Hadoop distribution They create connections to various 3rdparty tools used for transferring, storing or analysing Big Data Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Family - HDFS This component helps in connecting to a given HDFS so that the other Hadoop components can reuse that connection to communicate with the HDFS This component helps in copying the files from an user-defined directory and paste them into the HDFS and is also capable of renaming them This component helps in extracting the data in an HDFS file so that other components can process it This component helps in transferring the data flows into a given HDFS file system Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Family - Hive This component helps in establishing a Hive connection so that it can be reused by other Hive components This component helps in executing the select queries which extract the corresponding data and sends the data to the component that follows This component helps in writing the data of different formats into a given Hive table or in exporting data from a Hive table to a particular directory This component helps in connecting to the Hive database being used and creates a Hive table which will be dedicated to data of the specified format Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Big Data Family - Pig This component helps in loading the original input data to an output stream with a single transaction after the data is validated This component helps in transforming the data from single or multiple sources and then routing it to single or multiple destinations This component helps in adding one or more additional columns to the output of the grouped data to generate the data that can be used by Pig This component helps in executing the inner joins and outer joins of two files based on join keys in order to create the data to be used by Pig Copyright © 2017, edureka and/or its affiliates. All rights reserved.
First Job In Talend Copyright © 2017, edureka and/or its affiliates. All rights reserved.