330 likes | 456 Views
SQL Server 2012 Parallel Data Warehouse. Meinrad Weiss Principal Consultant Trivadis AG. Trivadis solution portfolio and competences. CUSTOMER. IT SOLUTIONS, SERVICES & PRODUCTS. Business departments. Business Integration Services. Business Intelligence. Infrastructure Engineering.
E N D
SQL Server 2012 Parallel Data Warehouse Meinrad Weiss Principal Consultant Trivadis AG
Trivadissolutionportfolioandcompetences CUSTOMER IT SOLUTIONS, SERVICES & PRODUCTS Business departments Business Integration Services Business Intelligence InfrastructureEngineering Application Development Managed Services Training IT departments Integration, Application Performance Management, Security TECHNOLOGIESOracle, Microsoft, IBM, Open Source SQL Server 2012 Parallel Data Warehouse
Trivadisfacts & figures • 11 Trivadis locationswithmorethan 600 employees • Financiallyindependentandsustainably profitable • Key figures 2011 • Revenue CHF 104 / EUR 84 Mio. • Services formorethan 800 clients in over 1,900 projects • 200 Service Level Agreements • More than 4,000 trainingparticipants • Research anddevelopmentbudget: CHF 5.0 / EUR 4 Mio. Hamburg ~200 employees Dusseldorf Frankfurt Stuttgart Vienna Munich Freiburg Basel ~30 employees Zurich Bern ~380 employees Lausanne 3 Trivadis– thecompany SQL Server 2012 Parallel Data Warehouse date
Visit us, win a price and be prepared for your next adventure SQL Server 2012 Parallel Data Warehouse
Agenda • Positioning Parallel Data Warehouse (PDW) • Architecture • Working with PDW • PDW and Big Data SQL Server 2012 Parallel Data Warehouse
Data Warehouse – Products Positioning PDW with Distributed Data Architecture Scale Complexity HA by default SW-HW integration Balanced solution for mostly scan-centric workloads. 1 3 Max HW tune-up for most DW scenarios. 2 2 SQL Server 2012PDW Most flexible architecture for handling all DW scenarios. 3 1 SQL Server 2012 Fast Track SQL Server 2012 SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 Parallel Data Warehouse Built For Big Data Next-generation Performance At Scale Scale-Out Architecture on Industry Standard Hardware SQL Server 2012 Parallel Data Warehouse
Agenda • Positioning Parallel Data Warehouse (PDW) • Architecture • Working with PDW • PDW and Big Data SQL Server 2012 Parallel Data Warehouse
Parallel Data Warehouse at a glance • Shared-nothing parallel database system • Massively parallel processing (MPP) • A “Control” server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodes • Multiple “Compute” servers running SQL Server • Delivered as an appliance • Balanced and pre-configured hardware and software • Scales from 2 to 56 Nodes • Fastest Time to Market SQL Server 2012 Parallel Data Warehouse
Base Unit (1/4 Rack) • All hosts and VM’s run Windows Server 2012 Standard • All Fabric and workload activity happens in Hyper-V virtual machines, with Fabric VM’s sharing 1 server • Failover is handled by Hyper-V • PDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workload • Windows Storage Spaces handles mirroring and spares FAB AD Control VMM Host 0 Host 1 Compute 1 Host 2 Storage Spaces Compute 2 Host 3 IB & Ethernet Direct attached SAS SQL Server 2012 Parallel Data Warehouse
Current Limitations: Performance and Scale Up today • Scale Up (and pay) ? ? $$$ $$$ $$$ Old serverwill be obsolete if bigger system is required SQL Server 2012 Parallel Data Warehouse
SCALING FROM 2 TO 56 NODES Host 0 Appliance can grow with increasing workload • Provisioning consists of 3 phases: • Bare metal provisioning of new nodes • Provisioning of workload VMs and ‘hooking up’ to other workload VMs • Redistribution of data FAB AD Control VMM Host 1 Host 2 Storage Spaces Compute 1 Compute 1 Host 3 Compute 2 Compute 2 Host 2 Storage Spaces Host 3 IB & Ethernet Direct attached SAS SQL Server 2012 Parallel Data Warehouse
Seamless Scalability 30 Nodes System Throughput 30 Nodes 20 Nodes Regular SQL Server( 1 Node) 10 Nodes 5 Nodes 4 8 12 16 20 24 28 32 36 40 SQL Server 2012 Parallel Data Warehouse SQL Server Compute Nodes
High Availability • Storage Spaces manages the physical disks in the disk enclosures • Failover: • One cluster across the whole appliance • VMs are automatically migrated on host failure • Affinity and anti-affinity maps enforce rules FAB AD Control VMM Host 0 Host 1 Compute 1 Host 2 Storage Spaces Compute 2 Host 3 IB & Ethernet Direct attached SAS SQL Server 2012 Parallel Data Warehouse
Agenda • Positioning Parallel Data Warehouse (PDW) • Architecture • Working with PDW • Introduction to Big Data • Microsoft and Big Data SQL Server 2012 Parallel Data Warehouse
Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication of Data: Replicate Smaller (<5GB ) Dimension Tables are Replicated on Every Compute Node SQL SQL SQL SQL TD PD MD SD TD PD MD SD Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold TD PD MD SD TD PD MD SD Mktg Campaign Dim Result: Fact -Dimension Joins can be performed locally Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL Server 2012 Parallel Data Warehouse
Create Replicated Table CREATE TABLE myTable ( id int NOT NULL, lastNamevarchar(20), zipCodevarchar(6) ) WITH (CLUSTERED COLUMNSTORE INDEX); • Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group. • Data Compression is automatically turned on SQL Server 2012 Parallel Data Warehouse
Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Distribution and Replication of Data: Distribute Larger (> 10 GB) Fact Table is Hash Distributed Across All Compute Nodes SQL SQL SQL SQL TD PD SF-1 SF-1 SF-1 SF-1 MD SD SF-1 SF-1 TD PD SF-1 SF-2 Sales Facts MD SD Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF-1 SF-1 TD PD SF-1 SF-3 MD SD SF-1 TD PD SF-1 SF-1 SF-4 MD SD Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL Server 2012 Parallel Data Warehouse
Distribution on a PDW CREATE TABLE myTable (column Defs)WITH(DISTRIBUTION = HASH (id)); PDW Node 1 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h 8 Tables per Node PDW Node 2 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h PDW Node … PDW Node 8 Create Table <myTable GUID>_a Create Table <myTable GUID>_b … Create Table <myTable GUID>_h Final Result: 64individual tables across a 8node (1 data rack HP) appliance SQL Server 2012 Parallel Data Warehouse
Lightning Fast Data Query Processing xVelocity gives next-gen performance Products • Columnstore Provides Dramatic Performance • Updateable and clustered xVelocitycolumnstore • Stores data in columnar format • Memory-optimized for next-generation performance • Updateable to support bulk and/or trickle loading Country Sales Supplier Customer Save Timeand Costs Up to50X Faster Up to 15x compression Real-TimeDW SQL Server 2012 Parallel Data Warehouse
Web-Based Management Dashboard SQL Server 2012 Parallel Data Warehouse
Web-Based Management Dashboard (2) SQL Server 2012 Parallel Data Warehouse
Web-Based Management Dashboard (3) SQL Server 2012 Parallel Data Warehouse
PDWQuerying 1Petabyte of data in 1 second 294‘000‘000‘000 rows http://www.sqlpass.org/summit/2012/DayOneKeynote.aspx SQL Server 2012 Parallel Data Warehouse
Reference Case: Today’s process flow / Building blocks Baseline : Once data extracted from SAP: Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies) SAP Suspicious words Reports 3hr21min MasterFinance table population 6 hours 21min DB_MasterTables DB_ ReportTables DB_GSAPOP DW_FinanceTransactions FinanceCube 1 hour 6 hours SQL Server 2012 Parallel Data Warehouse
Reference Case: Audit Process with PDW Once data is extracted from SAP: Creating 5 CM Reports & FSCP Finance Cube; Time taken: 30 Minutes SAP MasterFinance table population 8m50sec All 5 Reports within 6min DB_MasterTables DB_ ReportTables DB_GSAPOP (80) (80) (80) (80) FinanceCube DW_FinanceTransactions load from FlatFile 23min 11 min 10m10sec SQL Server 2012 Parallel Data Warehouse
Agenda • Positioning Parallel Data Warehouse (PDW) • Architecture • Working with PDW • PDW and Big Data SQL Server 2012 Parallel Data Warehouse
Seamless integration of two worlds Introducing PolyBase • Single Query; Structured and Unstructured • Query and join Hadoop tables with Relational Tables • Use Standard SQL language • Select, From Where SQL SQL Server 2012 PDW Powered by PolyBase Existing SQLSkillset Save Timeand Costs Analyze AllData Types HDFS (Hadoop) Database SQL Server 2012 Parallel Data Warehouse
Additional Resources SQL Server Parallel Data Warehouse (PDW) Landing Page: www.microsoft.com/PDW Introduction to Polybase: http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx Price/TB comparison: http://www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf HP QuickSpecs http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf SQL Server 2012 Parallel Data Warehouse
Conclusion • SQL Server 2012 Parallel Data Warehouse is an MPP based appliance optimized for data warehouse workload • Very Similar to a regular SQL Server, but T-SQL not 100 % identical • Support for development via SQL Server Data Tools • Scalable, balanced platform • Not just storage (data + CPU power) • Central part of Microsoft BI stack • Well integrated in Microsoft management tools SQL Server 2012 Parallel Data Warehouse
Visit us, win a price and be prepared for your next adventure SQL Server 2012 Parallel Data Warehouse
Wettbewerb Let‘sgo.