450 likes | 581 Views
Agenda. 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012. 10 Key SQL 2012 BI Innovations. BI Semantic Model Analysis Services Tabular Mode With BIDS support PowerPivot “2” Power View (SSRS) Self-Service Data Alerts (SSRS).
E N D
Agenda 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012
10 Key SQL 2012 BI Innovations • BISemantic Model • Analysis Services Tabular Mode • With BIDS support • PowerPivot “2” • Power View (SSRS) • Self-Service Data Alerts (SSRS) • Hadoop Big Data Integration • xVelocity–Columnstore Indexes • Geospatial Indexes • Unstructured Data Queries • Data Quality Services • Updated Master Data Services
Analysis Services: Tomorrow Build on the strengths and success of Analysis Services and expand its reach to a much broader user base Embrace the relational data model – well understood by developers and IT Pros Analysis Services Vision BI Semantic Model Bring together the relational and multidimensional models under a single unified BI platform—best of both worlds! Provide flexibility in the platform to suit the diverse needs of BI applications
BI Semantic Model • One Model for All User Experiences • Visualise analysis using your favourite tools • Model data the way you like • Store analytical data however it is best done • BISM is a concept, not a product • Can be hosted in PowerPivot or SSAS
BI Semantic ModelOne Model for All User Experiences Your Apps Reporting Services & Power View SharePoint Insights Excel PowerPivot BI Semantic Model Multi- dimensional Data Model Tabular Business Logic/Queries MDX DAX Direct Query Data Access ROLAP xVelocity MOLAP Databases LOB Applications Files OData Feeds Cloud Services
BI Semantic ModelWhat about existing Analysis Services applications? Existing applications Based on Unified Dimensional Model Existing applications Based on Unified Dimensional Model Existing applications Every UDM becomes a BI Semantic Model New applications New technology options
BI Semantic Model: Architecture Third-party applications Reporting Services SharePoint Insights Excel PowerPivot BI Semantic Model Multi- dimensional Data model Tabular Business logic and queries MDX DAX Direct Query Data access ROLAP VertiPaq MOLAP Databases LOB Applications Files OData Feeds Cloud Services
Tabular ModeScaling PowerPivot to Enterprise Needs • Model in PowerPivot • PowerPivot as source of SSAS Tabular Models • Excel for browsing and testing in SSDT • All new PowerPivot features: • Diagrams, Measure Grid, KPIs, Hierarchies, Perspectives, 30+ New DAX Functions • …and, unique to SSAS Tabular Mode: • Row-level Security, Partitions, Large Tables (>2 billion rows), Images, Memory Paging
Example: Power View Over a Sales Model End User Model Developer BI Semantic Model Data model Tabular Business logic and queries DAX Data access VertiPaq SQL Server Dynamics CRM
Example: Power View Over a Sales Model End User Model Developer BI Semantic Model Multi- dimensional Data model Tabular Business logic and queries MDX DAX Data access VertiPaq SQL Server Dynamics CRM
Example: Excel Over a Finance Model End User Model Developer BI Semantic Model Data model Multi-dimensional Business logic and queries MDX Data access MOLAP Oracle SAP
Example: Excel Over a Finance Model End User Model Developer BI Semantic Model Multi- dimensional Data model Tabular Business logic and queries DAX MDX Data access MOLAP Oracle SAP
Demo…. Tour of SQL Server 2012BISM
BI Semantic Model Flexibility Richness Scalability • VertiPaq for high performance, MOLAP for mission critical scale • DirectQuery and ROLAP for real-time • State-of-the-art compression algorithms • Multi-dimensional or tabular modeling • MDX and DAX for logic and queries • Cached and passthrough storage • Rich data modeling • Sophisticated business logic using MDX and DAX • Fine-grained security – row/cell level
Data Model Tabular • Familiar model, easier to build, faster time to solution • Advanced concepts (parent-child, many-to-many) not available natively in the model… need calculations to simulate these • Easy to wrap a model over a raw database or warehouse for reporting & analytics Multidimensional • Sophisticated model, higher learning curve • Advanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.) • Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model
Business Logic DAX • Based on Excel formulas and relational concepts – easy to get started • Complex solutions require steeper learning curve – row/filter context, Calculate, etc. • Calculated columns enable new scenarios, however no named sets or calc members MDX • Based on understanding of multidimensional concepts – higher initial learning curve • Complex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc. • Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members
Data Access and Storage VertiPaq • In-memory column store… typical 10x compression • Brute force memory scans… high performance by default… no tuning required • Basic paging support… data volume mostly limited to physical memory MOLAP • Disk based store… typical 3x compression • Disk scans with in-memory subcube caching… aggregation tuning required • Extensive paging support… data volumes can scale to multiple terabytes DirectQuery • Passes through DAX queries & calculations… fully exploits backend database capabilities • No support for MDX queries… no support for data sources other than SQL Server (in Denali) ROLAP • Passes through fact table requests… not recommended for large dimension tables • Supports most relational data sources… no support for aggregations except SQL Server indexed views
Analysis Services Architecture SharePoint Browser BI Development Studio Excel Services Reporting Services xlsx PowerPivot for Excel Analysis Services BI Semantic Model PowerPivot for SharePoint (Analysis Services) Excel xlsx Third Party Apps Personal BI Team BI Organizational BI
Apollo: A new column-oriented query accelerator • What is Apollo? • What does column-oriented mean? • How does it accelerate queries? • When to create a columnstore index • How to use Apollo • Creating an index • Running queries • Loading data • How does Apollo relate to VertiPaq and PowerPivot?
What is Apollo? • Apollo is the code name for new functionality that is available in SQL Server 2012 • It will substantially accelerate common data warehouse queries • Adds a column store option in SQL Server database engine • New index type in the database engine • Advanced query processing algorithms • New batch mode processing
When to use Apollo • Data warehousing • Read-mostly workloads • Star joins • Process large amounts of data • Generous amount of memory • Best performance when data fits in memory • Graceful degradation as fact table paged from disk • Under severe memory constraints, falls back to row-at-a-time processing
How does Apollo speed up queries? (1) Stores data column-wise Better compression • Uses VertiPaq compression technology Less IO … C1 C4 C5 C6 C3 C2
How does Apollo speed up queries? (2) SELECT region, sum (sales) … C2 C3 C6 C4 C5 C1 • Fetches only needed columns from disk • Less IO • Better buffer hit rates
Improved Data Warehouse Query performance • Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets • Performance improvements for “typical” data warehouse queries from 10x to 100x • Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables
What Happens When… • You need to execute high performance DW queries against very large data sets? • In SQL Server 2008 and SQL Server 2008 R2 • OLAP (SSAS) MDX solution • ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate tables • Inherently inflexible • In SQL Server 2012 • You can create a columnstore index on a very large fact table referencing all columns with supporting data types • Utilizing T-SQL and core Database Engine functionality • Minimal query refactoring or intervention • Upon creating the columnstore index, your table becomes “read only” – but you can still use partitioning to switch in and out data OR drop/rebuild indexes periodically
How Are These Performance Gains Achieved? • Two complimentary technologies: • Storage • Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). • Columnar storage allows for less data to be accessed when only a sub-set of columns are referenced • Data density/selectivity determines how compression friendly a column is – example “State” / “City” / “Gender” • Translates to improved buffer pool memory usage • New “batch mode” execution • Data can then be processed in batches (1,000 row blocks) versus row-by-row • Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O
Column Store (values compressed) Row Store (Heap / B-Tree) Column vs. Row Store data page 1000 data page 2000 data page 2002 data page 2001 data page 1001
Batch Mode • Allows processing of 1,000 row blocks as an alternative to single row-by-row operations • Enables additional algorithms that can reduce CPU overhead significantly • Batch mode “segment” is a partition broken into million row chunks with associated statistics used for Storage Engine filtering • Batch mode can work to further improve query performance of a columnstore index, but this mode isn’t always chosen: • Some operations aren’t enabled for batch mode: • E.g. outer joins to columnstore index table / joining strings / NOT IN / IN / EXISTS / scalar aggregates • Row mode might be used if there is SQL Server memory pressure or parallelism is unavailable • Confirm batch vs. row mode by looking at the graphical execution plan
Columnstore format + batch mode Variations • Performance gains can come from a combination of: • Columnstore indexing alone + traditional row mode in QP • Columnstore indexing + batch mode in QP • Columnstore indexing + hybrid of batch and traditional row mode in QP
SSMS T-SQL Creating a columnstore index
Defining the Columnstore Index • Index type • Columnstore indexes are always non-clustered and non-unique • They cannot be created on views, indexed views, sparse columns • They cannot act as primary or foreign key constraints • Column selection • Unlike other index types, there are no “key columns” • Instead you choose the columns that you anticipate will be used in your queries • Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter • No concept of “INCLUDE” • No 900 byte index key size limit • Column ordering • Use of ASC or DESC sorting not allowed – as ordering is defined via columnstore compression algorithms
Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes Demo..
Supported Data Types • Supported data types • Char / nchar / varchar / nvarchar • (max) types, legacy LOB types and FILESTREAM are not supported • Decimal/numeric • Precision greater than 18 digits NOT supported • Tinyint, smallint, int, bigint • Float/real • Bit • Money, smallmoney • Date and time data types • Datetimeoffset with scale > 2 NOT supported
Limitations • Columnstore indexes cannot be used in conjunction with • Change Data Capture and Change Tracking • Filestream columns (supported columns from same table are supported) • Page, row and vardecimal storage compression • Replication • Sparse columns • Data type limitations • Binary / varbinary / ntext / text / image / varchar (max) / nvarchar (max) / uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18 digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2 • You can prevent a query from using the columnstore index using the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint
Adding data to a table with a columnstore index • Method 1: Disable the columnstore index • Disable (or drop) the index ALTER INDEX my_index ON T DISABLE • Update the table • Rebuild the columnstore index ALTER INDEX my_indexON T REBUILD
Adding data to a table with a columnstore index • Method 2: Use Partitioning • Load new data into a staging table • Build a columnstore index CREATE NONCLUSTERED COLUMNSTORE INDEX my_index ON StagingT(OrderDate, ProductID, SaleAmount) • Switch the partition into the table ALTER TABLE StagingT SWITCH TO T PARTITION 5
Apollo and VertiPaq • VertiPaq: • PowerPivot for Excel • PowerPivot for Sharepoint • Analysis Services • Database Engine – Apollo • Use Apollo for relational data warehousing • Large fact tables • Ad hoc or reporting queries • When you don’t need MDX
Performance example SELECT w_city, w_state, d_year, SUM(cs_sales_price) AS cs_sales_price FROM warehouse, catalog_sales, date_dim WHERE w_warehouse_sk = cs_warehouse_sk and cs_sold_date_sk = d_date_sk and w_statein ('SD','OH') and d_year in (2001,2002,2003) GROUP BY w_city, w_state, d_year ORDER BY d_year, w_state, w_city; 1 TB version of the TPC-DS database 1.44 billion rows in catalog_sales fact table 32 logical processor machine with 256 GB RAM Warm start Query
Summary: Apollo in a nutshell Columnstore technology + Advanced query processing Astonishingspeedup for DW queries Great compression
Summary: SQL 2012 ColumStore • SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios • 10x to 100x performance improvement depending on the schema and query • I/O reduction and memory savings through columnstore compressed storage • CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs • Easy to deploy and requires less management than some legacy ROLAP or OLAP methods • No need to create intermediate tables, aggregates, pre-processing and cubes • Interoperability with partitioning • For the best interactive end-user BI experience, consider Analysis Services, PowerPivot and Crescent
xVelocity in SQL 2012 No more Vertipaq, it’s now called xVelocity in-memory technologies in SQL 2012