400 likes | 680 Views
Overview of Azure Data Lake Store. Fundamentals. Reliable Automatically replicates your data Three copies within a single region Highly available. Unlimited Storage Unlimited account sizes Individual file sizes from gigabytes to petabytes No limits to scale. Optimized for Analytics
E N D
Fundamentals Reliable Automatically replicates your data Three copies within a single region Highly available Unlimited Storage Unlimited account sizes Individual file sizes from gigabytes to petabytes No limits to scale Optimized for Analytics Built for running large analytics systems that require massive throughput Optimized for parallel computation over petabytes of data Automatically optimizes for any throughput
Secure your Data Access control POSIX-compliant Access Control Lists (ACLs) on Files and Folders * Integrated with Azure Active Directory Auditing Audit logs for all operations Audit logs that can be analyzed with ADL U-SQL Scripts Encryption Transparent server-side encryption * Azure-managed (Azure Key Vault) and customer-managed keys* * Features arriving by GA
HDFS for the Cloud Built from the ground up as a Hadoop file system Other Tools running in HDI HDI Cluster Types Hadoop Distros Microsoft R Services (Revolution R) Works Today Sqoop Works Today Hadoop Works Today Hortonworks* By GA Apache Hadoop Version 2.8 and above Distcp Works Today Storm Works Today Cloudera* By GA HBase Works Today Spark By GA * Features arriving by GA
Scenarios ADL Store Azure Blob Storage Optimized for Analytics Billing Pay for amount stored and for I/O operations WebHDFS Implements WebHDFS No WebHDFS Authentication Azure Active Directory Access Keys Authorization POSIX-style ACLs Access Keys Data Encryption Transparent Server-side Encryption* Client-Side Encryption * Features arriving by GA General purpose bulk storage
Ingress and Egress • ADL SDKs • .NET SDK • Node.Js SDK • Java SDK * • Python SDK * • Services • Azure Data Factory • ADL Copy Service • Azure Import/Export Service • Azure Stream Analytics* • Tools • Apache Sqoop™ • DistCp • Azure Portal • Azure PowerShell • Azure X-Platform CLI • ADL REST endpoints • Curl • Any HTTP REST Client * Features arriving by GA
Integration with Azure Data Factory Sources Sinks Azure Blob Azure Table Azure Blob Azure SQL Database Azure SQL Data Warehouse Azure Table Azure DocumentDB Azure Data Lake Store Azure SQL Database SQL Server File system Azure SQL Data Warehouse Oracle database MySQL database Azure DocumentDB DB2 database Teradata database Azure Data Lake Store Sybase database PostgreSQL database SQL Server