240 likes | 484 Views
Welcome. Vorstellung Parallel Data Warehouse November 2012 Meinrad Weiss. Data Warehouse – Products Positioning. PDW with Distributed Data Architecture. Minimal HW tune-up/optimization; supports mixed workloads. 1. Scale Complexity HA by default SW-HW integration. 4.
E N D
Welcome Vorstellung Parallel Data Warehouse November 2012Meinrad Weiss Vorstellung Parallel Data Warehouse
Data Warehouse – Products Positioning PDW with Distributed Data Architecture Minimal HW tune-up/optimization; supports mixed workloads 1 Scale Complexity HA by default SW-HW integration 4 Balanced solution for mostly scan-centric workloads. 2 PDW 3 SQL Server 2008 R2 Data Center Max HW tune-up for most DW scenarios. 1 3 SQL Server 2008 R2 Fast Track Most flexible architecture for handling all DW scenarios. 4 2 SQL Server 2008 R2 Enterprise Vorstellung Parallel Data Warehouse
Tier 1 Offerings Microsoft Data Warehousing Solutions Vorstellung Parallel Data Warehouse
Data Warehouse – Products Positioning PDW with Distributed Data Architecture 100% SQL Server 2008 R2 Compatibility Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 Data Center PDW SQL Server 2008 R2 with Fast Track Reference Architecture SQL Server 2008 R2 Enterprise Vorstellung Parallel Data Warehouse
MPP vs. SMP • MPP - Massively Parallel Processing • Uses many separate CPUs running in parallel to execute a single program • Each CPU has its ownmemory and disks • High-speed communications between nodes • Applications must be segmented • SMP - Symmetric Multiprocessing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers • All SQL Server implementations up until now have been SMP SMP MPP Vorstellung Parallel Data Warehouse
Two hardware vendors: HP and Dell Microsoft+DellParallel Data Warehouse Appliance Microsoft+HP Enterprise Data Warehouse Appliance Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) SQL Control Node SQL SQL SQL SQL Management Node SQL SQL SQL Landing Zone SQL SQL Backup Node SQL Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) • Client connections always go through the control node • Windows Failover Cluster for Availability • Contains no persistent user data • Processes SQL requests • Prepares execution plan • Orchestrates distributed execution • Local SQL Server processes final query plan and aggregates results Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Management Node Landing Zone Backup Node Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) Control Node SQL • Provides Support and Patching for the Appliance • Holds image for re-deployment of compute node • Holds Active Directory Management Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Landing Zone Backup Node Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL DWLoader or SQL Server Integration Services Management Node • Provides high-capacity storage for data files from ETL processes • Is available as a sandbox for other applications and scripts that run on the internal network • Provides SQL Server Integration Services Landing Zone Compute Nodes Source Landing Zone Files Data Loader Backup Node Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Management Node Landing Zone • Provides Integrated Backup Solution • Integrates with 3rd party backup products • Orderable in different sizes Backup Node Vorstellung Parallel Data Warehouse
Control Rack Data Rack(s) • Data Rack Servers 5/10 active + 1 passive per Rack • InfiniBand, FC and Ethernet switching • Expansion Grow from 1/2–4 data racks, storage options, test/dev system • Consists of COMPUTE NODES and STORAGE NODES • Shared Nothing • Spare Node provides failover in case of node failure SQL Control Node SQL SQL SQL SQL Management Node SQL SQL SQL Landing Zone SQL SQL Backup Node SQL Vorstellung Parallel Data Warehouse
Connectivity and Tools • Nexus Query Chameleon • DWSQL Vorstellung Parallel Data Warehouse
Creating a Database CREATE DATABASE PDWWITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = 16384 GB, -- (whole System) LOG_SIZE = 1024 GB); Vorstellung Parallel Data Warehouse
Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication ofData: Replicate Smaller (<5GB ) Dimension Tables are Replicated on Every Compute Node SQL SQL SQL SQL TD PD SF-1 SF-1 SF-1 SF-1 MD SD SF-1 SF-1 TD PD SF-1 SF-2 MD SD Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF-1 SF-1 TD PD SF-1 SF-3 MD SD SF-1 TD PD SF-1 SF-1 SF-4 MD SD Mktg Campaign Dim Result: Fact -Dimension Joins can be performed locally Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End Vorstellung Parallel Data Warehouse
Create Replicated Table CREATETABLEDimProduct( ProductIdBIGINTNOTNULL, DescriptionVARCHAR(50), CategoryIdINTNOTNULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION =REPLICATE); • Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group. • Data Compression is automatically turned on Vorstellung Parallel Data Warehouse
Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication ofData: Distribute Larger (> 10 GB) Fact Table is Hash Distributed Across All Compute Nodes SQL SQL SQL SQL SF-1 SF-1 SF-1 SF-1 SF-1 SF-1 SF-1 SF-2 Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF-1 SF-1 SF-1 SF-3 SF-1 SF-1 SF-1 SF-4 Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End Vorstellung Parallel Data Warehouse
Distribution on a PDW CREATE TABLE myTable (column Defs)WITH(DISTRIBUTION = HASH (id)); PDW Node 1 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h 8 Tables per Node PDW Node 2 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h PDW Node … PDW Node 10 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h Final Result: 80 individual tables across a 10 node (1 data rack) appliance Vorstellung Parallel Data Warehouse
Reference Case: Today’s process flow / Building blocks Baseline : Once data extracted from SAP: Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies) SAP Suspicious words Reports 3hr21min MasterFinance table population 6 hours 21min DB_MasterTables DB_ ReportTables DB_GSAPOP DW_FinanceTransactions FinanceCube 1 hour 6 hours
Reference Case: Audit Process with PDW Once data is extracted from SAP: Creating 5 CM Reports & FSCP Finance Cube; Time taken: 30 Minutes SAP MasterFinance table population 8m50sec All 5 Reports within 6min DB_MasterTables DB_ ReportTables DB_GSAPOP (80) (80) (80) (80) FinanceCube DW_FinanceTransactions load from FlatFile 23min 11 min 10m10sec
Appliance Update AU3 • Performance – up to 10x improvement • Data Movement Services • New cost based Query Optimizer • New Data Movement Service • 1/2 rack appliances from HP and Dell • System Center 2012 Integration (SCOM pack) • And YES … Support for Stored Procedures (subset) • Collations: Full support for international data • Native SQL Server drivers Vorstellung Parallel Data Warehouse
Hub and Spoke Departmental Reporting High-Performance Reporting RegionalReporting Central EDW Hub Landing Zone RegionalReporting with Business Decision Appliance Third-PartyData Integration Mobile Applications Third-PartyRDBMS ETL Tools Vorstellung Parallel Data Warehouse
Seamless Scalability 4 Rack PDW( 30 Nodes) System Throughput 3 Rack PDW( 30 Nodes) 2 Rack PDW( 20 Nodes) Regular SQL Server( 1 Node) Full Rack PDW( 10 Nodes) Half Rack PDW( 5 Nodes) 4 8 12 16 20 24 28 32 36 40 Vorstellung Parallel Data Warehouse SQL Server Compute Nodes
Wettbewerb Let‘sgo. Vorstellung Parallel Data Warehouse