300 likes | 472 Views
PDW Architecture Gets Real :. Customer Implementations. Brian Walker | Microsoft Corporation PDW Center of Excellence. Murshed Zaman | Microsoft Corporation SQL Customer Advisory Team . Please silence cell phones. Agenda. Introduction to PDW and How it Works Detail.
E N D
PDW Architecture Gets Real: Customer Implementations Brian Walker | Microsoft Corporation PDW Center of Excellence Murshed Zaman | Microsoft Corporation SQL Customer Advisory Team
Agenda Introduction to PDW and How it Works Detail PDW xVelocity - Reporting Structured/Unstructured Data Demos Highlight Current Customer Use Cases Future
Introducing Parallel Data Warehouse Appliance Experience • Pre-Built Hardware + Software Appliance • Co-engineered with HP and Dell • Pre-built Hardware • Pre-installed Software • Appliance installed in 1-2 days • Support - Microsoft provides first call support • Hardware partner provides onsite break/fix support Plug and Play Built-in Best Practices Save Time
The Power of PDW • Massively Parallel Processing (MPP) • Symmetric Multi-Processing (SMP) Uses many separate CPUs running in parallel to execute a single query Each CPU has its own memory Dedicated Infiniband network communications between servers Multiple CPUs used to complete individual processes simultaneously All CPUs share the same memory, and disks Network controllers share bandwidth
The Basic Full Rack SQL Server PDW 2012 • Reduce hardware footprint by virtualizing the entire control server rack down to a few nodes • 1.5x lower price/TB providing the one of the lowest price/TB in the industry • Save up to 70% of storage with up to ~15x compression via the xVelocity columstore • Resilient, scalable, and high performance storage features in Windows Server 2012 replace SAN with high density, low cost SAS JBODS • 70% more disk I/O bandwidth over SQL Server PDW 2008 R2 Infiniband & Ethernet • 128 cores on 8 compute nodes • 2TB of RAM on compute • Up to 168 TB of temp DB • Up to 1PB of user data 1 RACK
Date Dim Item Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc Store Dim Store Dim ID Store Name Store Mgr Store Size Data Layout PDW Compute Nodes Dimensional Model D I F1 s P D I F 2 S P D I F 3 S P SlsFact D I S 4 Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold S P D I F 5 S P Promo Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End
Seamlessly Add Capacity Start Small Linearly Scale OUT • Smallest (53TB) To Largest (6PB) • Start small with a few Terabyte warehouse • Add capacity up to 6 Petabytes Add Capacity Add Capacity Start Small And Grow Largest Warehouse PB 53 TB 6 PB
Any Size : Next-Gen Performance xVelocity - Fast Data Query Processing Products • Columnstore Provides Dramatic Performance • Updateable and clustered xVelocity columnstore • Stores data in columnar format • Memory-optimized for next-generation performance • Updateable to support bulk and/or trickle loading Country Sales Supplier Customer Save Timeand Costs Up to50X Faster Up to 15x compression Batch Processing
Demo: xVelocity The Power of Updatable ColumnStore Indexing on PDW 2012
Any Data: Hadoop Integration Regular T-SQL Results Enhanced PDW Query Engine Polybase Details • External Tables and full SQL query access to data stored in HDFS • HDFS bridge for direct & fully parallelized access of data in HDFS • Joining ‘on-the-fly’ PDW data with data from HDFS • Parallel import of data from HDFS in PDW tables for persistent storage • Parallel export of PDW data into HDFS including ‘round-tripping’ of data PDW 2012 External Table HDFS Bridge HDFS Data Nodes Unstructured data Structured data
Existing Excel Skillset With Big Data Familiar Tools To Analyze Structured/Unstructured Data • Familiar Tools Analyse Big Data • Native Microsoft BI Integration to PDW • Structured and unstructured data in same spreadsheet • Widely adopted and familiar user tools Hadoop Data High AdoptionOf Excel No ITIntervention Analyze AllData Types Structured Data
Demo: PDW 2012 Polybase Simultaneous Reporting from Structured and Unstructured Data
Upgrading to PDW Gains 100x Improvement “…basic queries that previously took 20 minutes only took seconds using the SQL Server 2008 R2 Parallel Data Warehouse.” -Tom Settle, Assistant VP, Data Warehousing, Hy-Vee Large US Grocery Store chain needed an MPP Data Warehouse to improve performance, scale and provide timely data to its Executives and Analysts PDW offered 100X Query Performance gain over conventional SQL Server, Faster Data loads and more scale with 7 instead of 2 years of purchasing data PDW will scale to meet future growth and support more functional areas at Hy-Vee Benefits 16
Business Objectives Critical Provide Broader Range of Critical Customer Purchasing Data - Current system only supported 2 years of data – Business required 7 years Improve Performance of Complex Transformations - Faster delivery of data within specified SLAs Enable User Ad hoc Reporting - Leveraging Excel/SharePoint Load Speed Enable Self-Service Reporting - SSAS/SSRS/SharePoint/Excel Save Time Query Save Costs Provide solution that Scales to Meet Future Data Needs - Expansion of history, point of sale detail, and expansion into social media Reduced IT Costs - Creating self-sufficient end users – Frees IT to focus on delivering new data Scale
Shift from ETL to ELT Using the Power of MPP • Move their complex transformations and calculations to SQL Server Parallel Data Warehouse from ETL server • PDW has allowed Hy-Vee to create an enterprise data warehouse centralizing data from many sources • Archiving point of sale source files for later data extraction Complex Transformations
Upgrade to PDW 2012 Future Option • Improves their opportunity to further analyze social media data • Query data without having to move it into a relational database • Provides an alternative archive solution for point of sale data
Data Archive Challenge – Financial Customer Reporting Services • Business only actively analyzes a rolling 12 months of data • Regulations require data is on-line and accessible for extended period • Data > 12 months is pushed to a farm of SQL servers to meet regulatory requirements Current Solution Archive Servers Centralized EDW
Data Archive Challenge – Financial Customer Reporting Services • Replace archive farm with Hadoop cluster • PDW provides single point of access • Allows analyst to leverage existing SQL skills • Much lower maintenance and administration • Meets regulatory requirements Future Solution Archive Servers Centralized EDW HDFS bridge HDFS Data Nodes Unstructured data
AMD Boosts Performance with PDW “We used to worry about backlogs, but no more,” - RajaraoChitturi, Database and Applications Manager at AMD AMD is also processing more reporting queries than it previously could—between 10,000 and 13,000 a day—with an average runtime of a few seconds and virtually no performance issues. AMD runs an average of 1,500 loads per day, and data loads to a given table range from four-minute to four-hour intervals. AMD averages about 500,000 file loads a day. Because of the user complaints about the previous system, the data warehouse team had one employee devoted full time to addressing performance-related support tickets. With Parallel Data Warehouse, AMD has reduced support work to just a few hours a week. Benefits 22
AMD Business Challenges Obstacles With SMP Oracle Load Demand Linux Based Reporting • Only supported 6 month data retention • Issues loading concurrently with high query volume • Loading data always lagged behind by days • Analyst couldn’t access recent data • Continuous data loads throughout the day while users were querying the system • Custom reporting tools hosted on Linux uses JDBC and ODBC drivers
Project Overview Critical Wafer Quality Assurance Data - 42 TB on PDW Save Space Space Saving PDW Index Lite Approach - Oracle required excessive non-clustered indexes to get any performance Improved Loading Speed - 660 GB/hr. throughput Load Speed 10,000 – 13,000 Analytic Queries per Day - Most are scan intensive Save Time Query Save Costs Faster Backups – Complete in 1~2 hours per Database - Compared to a week on Oracle Reduced Support Costs by 90% - No more chopping up queries to fit the data warehouse
Other PDW Sessions • Online Advertising: Hybrid Approach to Large-Scale Data Analysis (DAV-303-M) • Data Analytics and Visualization • Breakout Session (60 minutes) • Fri April 12, 2013, 2:45 PM - 3:45 PM in Sheraton 3 • Anna Skobodzinski • Christian Bonilla • Dmitri Tchikatilov • Trevor Attridge
Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference.com/evalsor follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
Thank you! Platinum Sponsor Diamond Sponsor