1 / 24

Welcome

Welcome. Vorstellung Parallel Data Warehouse November 2012 Meinrad Weiss. Data Warehouse – Products Positioning. PDW with Distributed Data Architecture. Minimal HW tune-up/optimization; supports mixed workloads. 1. Scale Complexity HA by default SW-HW integration. 4.

yvonne
Download Presentation

Welcome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome Vorstellung Parallel Data Warehouse November 2012Meinrad Weiss Vorstellung Parallel Data Warehouse

  2. Data Warehouse – Products Positioning PDW with Distributed Data Architecture Minimal HW tune-up/optimization; supports mixed workloads 1 Scale Complexity HA by default SW-HW integration 4 Balanced solution for mostly scan-centric workloads. 2 PDW 3 SQL Server 2008 R2 Data Center Max HW tune-up for most DW scenarios. 1 3 SQL Server 2008 R2 Fast Track Most flexible architecture for handling all DW scenarios. 4 2 SQL Server 2008 R2 Enterprise Vorstellung Parallel Data Warehouse

  3. Tier 1 Offerings Microsoft Data Warehousing Solutions Vorstellung Parallel Data Warehouse

  4. Data Warehouse – Products Positioning PDW with Distributed Data Architecture 100% SQL Server 2008 R2 Compatibility Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 Data Center PDW SQL Server 2008 R2 with Fast Track Reference Architecture SQL Server 2008 R2 Enterprise Vorstellung Parallel Data Warehouse

  5. MPP vs. SMP • MPP - Massively Parallel Processing • Uses many separate CPUs running in parallel to execute a single program • Each CPU has its ownmemory and disks • High-speed communications between nodes • Applications must be segmented • SMP - Symmetric Multiprocessing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers • All SQL Server implementations up until now have been SMP SMP MPP Vorstellung Parallel Data Warehouse

  6. Two hardware vendors: HP and Dell Microsoft+DellParallel Data Warehouse Appliance Microsoft+HP Enterprise Data Warehouse Appliance Vorstellung Parallel Data Warehouse

  7. Control Rack Data Rack(s) SQL Control Node SQL SQL SQL SQL Management Node SQL SQL SQL Landing Zone SQL SQL Backup Node SQL Vorstellung Parallel Data Warehouse

  8. Control Rack Data Rack(s) • Client connections always go through the control node • Windows Failover Cluster for Availability • Contains no persistent user data • Processes SQL requests • Prepares execution plan • Orchestrates distributed execution • Local SQL Server processes final query plan and aggregates results Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Management Node Landing Zone Backup Node Vorstellung Parallel Data Warehouse

  9. Control Rack Data Rack(s) Control Node SQL • Provides Support and Patching for the Appliance • Holds image for re-deployment of compute node • Holds Active Directory Management Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Landing Zone Backup Node Vorstellung Parallel Data Warehouse

  10. Control Rack Data Rack(s) Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL DWLoader or SQL Server Integration Services Management Node • Provides high-capacity storage for data files from ETL processes • Is available as a sandbox for other applications and scripts that run on the internal network • Provides SQL Server Integration Services Landing Zone Compute Nodes Source Landing Zone Files Data Loader Backup Node Vorstellung Parallel Data Warehouse

  11. Control Rack Data Rack(s) Control Node SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Management Node Landing Zone • Provides Integrated Backup Solution • Integrates with 3rd party backup products • Orderable in different sizes Backup Node Vorstellung Parallel Data Warehouse

  12. Control Rack Data Rack(s) • Data Rack Servers 5/10 active + 1 passive per Rack • InfiniBand, FC and Ethernet switching • Expansion Grow from 1/2–4 data racks, storage options, test/dev system • Consists of COMPUTE NODES and STORAGE NODES • Shared Nothing • Spare Node provides failover in case of node failure SQL Control Node SQL SQL SQL SQL Management Node SQL SQL SQL Landing Zone SQL SQL Backup Node SQL Vorstellung Parallel Data Warehouse

  13. Connectivity and Tools • Nexus Query Chameleon • DWSQL Vorstellung Parallel Data Warehouse

  14. Creating a Database CREATE DATABASE PDWWITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = 16384 GB, -- (whole System) LOG_SIZE = 1024 GB); Vorstellung Parallel Data Warehouse

  15. Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication ofData: Replicate Smaller (<5GB ) Dimension Tables are Replicated on Every Compute Node SQL SQL SQL SQL TD PD SF-1 SF-1 SF-1 SF-1 MD SD SF-1 SF-1 TD PD SF-1 SF-2 MD SD Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF-1 SF-1 TD PD SF-1 SF-3 MD SD SF-1 TD PD SF-1 SF-1 SF-4 MD SD Mktg Campaign Dim Result: Fact -Dimension Joins can be performed locally Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End Vorstellung Parallel Data Warehouse

  16. Create Replicated Table CREATETABLEDimProduct( ProductIdBIGINTNOTNULL, DescriptionVARCHAR(50), CategoryIdINTNOTNULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION =REPLICATE); • Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group. • Data Compression is automatically turned on Vorstellung Parallel Data Warehouse

  17. Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication ofData: Distribute Larger (> 10 GB) Fact Table is Hash Distributed Across All Compute Nodes SQL SQL SQL SQL SF-1 SF-1 SF-1 SF-1 SF-1 SF-1 SF-1 SF-2 Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF-1 SF-1 SF-1 SF-3 SF-1 SF-1 SF-1 SF-4 Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End Vorstellung Parallel Data Warehouse

  18. Distribution on a PDW CREATE TABLE myTable (column Defs)WITH(DISTRIBUTION = HASH (id)); PDW Node 1 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h 8 Tables per Node PDW Node 2 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h PDW Node … PDW Node 10 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h Final Result: 80 individual tables across a 10 node (1 data rack) appliance Vorstellung Parallel Data Warehouse

  19. Reference Case: Today’s process flow / Building blocks Baseline : Once data extracted from SAP: Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies) SAP Suspicious words Reports 3hr21min MasterFinance table population 6 hours 21min DB_MasterTables DB_ ReportTables DB_GSAPOP DW_FinanceTransactions FinanceCube 1 hour 6 hours

  20. Reference Case: Audit Process with PDW Once data is extracted from SAP: Creating 5 CM Reports & FSCP Finance Cube; Time taken: 30 Minutes SAP MasterFinance table population 8m50sec All 5 Reports within 6min DB_MasterTables DB_ ReportTables DB_GSAPOP (80) (80) (80) (80) FinanceCube DW_FinanceTransactions load from FlatFile 23min 11 min 10m10sec

  21. Appliance Update AU3 • Performance – up to 10x improvement • Data Movement Services • New cost based Query Optimizer • New Data Movement Service • 1/2 rack appliances from HP and Dell • System Center 2012 Integration (SCOM pack) • And YES … Support for Stored Procedures  (subset) • Collations: Full support for international data • Native SQL Server drivers Vorstellung Parallel Data Warehouse

  22. Hub and Spoke Departmental Reporting High-Performance Reporting RegionalReporting Central EDW Hub Landing Zone RegionalReporting with Business Decision Appliance Third-PartyData Integration Mobile Applications Third-PartyRDBMS ETL Tools Vorstellung Parallel Data Warehouse

  23. Seamless Scalability 4 Rack PDW( 30 Nodes) System Throughput 3 Rack PDW( 30 Nodes) 2 Rack PDW( 20 Nodes) Regular SQL Server( 1 Node) Full Rack PDW( 10 Nodes) Half Rack PDW( 5 Nodes) 4 8 12 16 20 24 28 32 36 40 Vorstellung Parallel Data Warehouse SQL Server Compute Nodes

  24. Wettbewerb Let‘sgo. Vorstellung Parallel Data Warehouse

More Related