1 / 13

Sqoop 2 Introduction

Sqoop 2 Introduction. Mengwei Ding, Software Engineer Intern at Cloudera. What is Sqoop. Apache Top-Level Project SQl and hadOOP Transfer a large bulk of data From relational data warehouses: Teradata, MySQL, PostgreSQL , Oracle, Netezza

alyn
Download Presentation

Sqoop 2 Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera

  2. What is Sqoop • Apache Top-Level Project • SQl and hadOOP • Transfer a large bulk of data • From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza • ToHadoop ecosystem: HDFS, Hive, HBase, Avio • Vice versa • Sqoop 1(1.4.3) and Sqoop 2(1.99.2)

  3. Sqoop 1

  4. Sqoop 1 Challenges • Command line tool, configured with line arguments(60+!) • Connector-driven: • Responsible for metadata lookups and data transfer • JDBC vocabulary-enforced (--connect) • Implicit connector selection • Non-uniform, duplicated functionality • Client accesses hadoop configurations and databases directly • Security Concerns: • Client needs to know credentials to databases • Type mapping is not clearly defined

  5. Sqoop 2 - Design Goals • Same goal: transfer data around • Ease of Use • Sqoop as a Service • Domain Specific Interactions without too many args • Ease of Extension • No low-level Hadoop knowledge needed • Uniform functionality of connectors, no functional overlap between connectors • Security and Separation of Concerns • Role based access and use

  6. Sqoop 2 - Design Goals

  7. Sqoop 2 - Connection vs Job Metadata • There are two distinct sets of options • Connection (distinct per database) • Job (distinct per table)

  8. Sqoop 2 - Connection vs Job Metadata • Another distinct two sets of arguments • Connector specific • Shared across all connectors

  9. Sqoop 2 - Security • Support for secure access to external system via role-based access to connection objects • Administrators create/edit/delete connections • Operators use connections • Connection encompass credentials • Connection created once, then reused later • Created by Admin, used by operator to safeguard credential access from end user

  10. Sqoop 2 - Resource Management • Connections allow specification of resource policy • Administrator can limit the total number of physical connections open at one time • Connections can be disabled

  11. Sqoop 2 - Current Status • Primary focus of Sqoop community • Second cut: 1.99.2 • bits and docs: http://sqoop.apache.org

  12. Demo Time

  13. Thank You!

More Related