360 likes | 487 Views
BI Sematic Model. Albert van Dok SQL Zaterdag 12 november 2011. Agenda. Background Life Before BISM What is BISM BISM Positioning Questions. B ackground. From data towards information By nature the demand for (new) information and insights will always evolve
E N D
BI Sematic Model Albert van Dok SQL Zaterdag 12 november 2011
Agenda • Background • Life Before BISM • What is BISM • BISM Positioning • Questions
Background • From data towards information • By nature the demand for (new) information and insights will always evolve • To connect and integrate (new) datasources is an essential part • Preparing data for use • Data cleansing • Define relationships • Data enrichment • Add calculations • Versioning
OLAP Browser OLAP Browser Datamart Datamart Life before BISM Data Source Data Model Tool MOLAP MOLAP Reporting Tool Reporting Tool DW Reporting Tool OLTP
OLAP Browser OLAP Browser Life before BISM Data Source Data Model Tool MOLAP MOLAP UDM Datamart Reporting Tool Datamart Reporting Tool DW Reporting Tool OLTP
OLAP Browser OLAP Browser Life before BISM Data Source Data Model Tool Analysis Services MOLAP • Security • End-user model • Transalations • Actions • KPI… • Calculations • Basic dim. model • Cube & Dimensions • Storage & Caching policies • Linked Objects • Datasource view MOLAP UDM UDM Datamart XML/A Reporting Tool Datamart Reporting Tool Cache DW Reporting Tool OLTP
The UDM in SSAS 2008 R2 Besides the advantages the UDM: • Is often too complex for simple reporting purposes • Has a steep learning curve • Uses MDX which is different than SQL… • Must be implemented by a BI professional • Needs small investment just to start MDX MDX MDX MDX
The holy grail: Self Service BI • New paradigm • “Business intelligence for the masses” • “Managed self-service business intelligence” • Put simple, powerful BI tools in the hands of “knowledge workers” • Familiar tools: Excel • People who own the data • Excel spreadsheet, Access database or SharePoint list data • Reality: Office power users
New kid on the block: Powerpivot Powerpivotfor Excel • Free Addin for Excel • Running 32/64bit and lots of RAM… • Contains Vertipaq engine (SSAS running in process with Excel) Powerpivot for Sharepoint • Comes with SQL Server 2008 R2 x64 • Sharepoint 2010 extention • Vertipaq running on server side • For sharing and managing PowerPivot applications
Powerpivot PowerPivot has its own semantic model which can be seen as BISM v1 • enables connecting data from various data sources • add relations between tables • add calculations, two places: • in tables – calculated columns (DAX) • over the whole model – calculated measures (DAX) • works in cached (VertiPaq) mode • Covers personal and team BI segments
What is Vertipaq • In-memory column-based database • Very high data compression • Doesn’t require the process of designing and building aggregations and other tunning • Support partitioning and paging on large data sizes
Relational Database Select id, name, BalDue from Customers where BalDue > $500 CPU • Query summary: • 3 pages read from disk • Up to 9 L1 and L2 cache misses (one per tuple) L1 Cache .. $50 .. $9000 .. $1300 .. $1500 .. $1010 .. $3000 .. $1700 .. $500 .. $0 64 bytes Page 1 L2 Cache .. $1300 .. $3000 .. $0 .. $500 .. $1700 .. $1010 .. $1500 .. $50 .. $9000 Page 2 64 bytes Memory (DBMS Buffer Pool) • Don’t forget that: • An L2 cache miss can stall the CPU for up to 200 cycles Page 3 8K bytes 7 Sue … $1010 8 Bob … $50 9 Jim … $1,300 4 Jim … $1,500 5 Liz … $0 6 Dave … $9,000 1 Bob … $3000 2 Sue … $500 3 Ann … $1,700 4 … $1500 5 … $0 6 … $9000 1 … $3000 2 … $500 3 … $1700 7 … $1010 8 … $50 9 … $1300 15
Columnstore Database Select id, name, BalDue from Customers where BalDue > $500 CPU • Takeaways: • Each cache miss brings only useful data into the cache • Processor stalls reduced by up to a factor of: • 8 (if BalDue values are 8 bytes)16 (if BalDue values are 4 bytes) 1300 L1 Cache 64 bytes L2 Cache 1300 64 bytes … … … … ….. … … … … ….. … … … … ….. … … … … ….. 3000 500 1700 3000 500 1700 9000 1010 50 9000 1010 50 1300 9000 1010 50 1300 3000 500 1700 1500 0 3000 500 1700 1500 0 9000 1010 50 Memory 8K bytes • Caveats: • Not to scale! An 8K byte page of BalDue values will hold 1000 values (not 5) • Not showing disk I/Os required to read id and Name columns Id 1500 0 1500 0 Bob Sue Ann Jim Liz Name Dave Sue Bob Jim BalDue 1 2 3 4 5 6 7 8 9 Street 16
An example Assume: • Customer table has 10M rows, 200 bytes/row (2GB total size) • Id and BalDue values are each 4 bytes long, Name is 20 bytes Query: Select id, Name, BalDue from Customer where BalDue > $1000 Row store execution: Scan 10M rows (2GB) @ 80MB/sec = 25 sec. Column store execution: Scan 3 columns, each with 10M entries 280MB@80MB/sec = 3.5 sec. (id 40MB, Name 200MB, BalDue 40MB) About a 7X performance improvement for this query!! But we can do even better using compression
Demo • Powerpivot
We are not there yet • Although Powerpivot for Excel is great, it has certain limitations • Limit to 2Gb, no support for partitions, queries Vertipaqcache, daily scheduled data refresh in Sharepoint, accesto workbook • PowerPivotand Analysis Services are two different productshence two models • Powerpivot targets business users, model managed in Excel • Analysis Services targets BI professionals and IT, model managed on the server • “Can we have one model which integrate both worlds and seamlessly transition BI applications from Personal BI to Team BI to Organizational/Professional BI?”
What is coming in Denali • BISM v2 • One model for all • reporting, analysis, dashboards, scorecards • personal, team, corporate BI • Has a relational and multidimensional API • Support both cached (Molap & VertiPaq) and the pass-through (realtime) mode • only SQL Server data sources for now • Pass-through • no additional database • data stays as is in the original structures • ideal for the realtime analysis
Why does this work • In “Denali” every cube automatically becomes a BI Semantic Model • To create a BI semantic model you create a: • multidimensional model, tabular model, PowerPivot workbook • Every model looks like cubes/dimensions/measure groups/data sources/data source views under the covers • they share a common Analysis Services file format. • this shared underlying structure that makes the BI semantic model work
BISM Data model • Hybrid model supporting multidimensional and tabular data models • Developed using an multidimensional or a tabular project • Choice depends on application needs and skillset • Tabular • Familiar model, easier to build, faster time to solution • Not all advanced concepts (e.g. many-to-many) not available natively in the model… need calculations to simulate these • Easy to wrap a model over a raw database or warehouse for reporting & analytics • Multidimensional • Sophisticated model, higher learning curve • Advanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.) • Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model
BISM Business Logic & Queries • Represents the intelligence or semantics in the model • Defines entities and relations between them • User-oriented • DAX • Based on Excel formulas and relational concepts – easy to get started • Complex solutions require steeper learning curve – row/filter context, Calculate, etc • Calculated columns enable new scenarios, however no named sets or calcmembers • MDX • Based on understanding of multidimensional concepts – higher initial learning curve • Complex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc. • Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members
BISM Data Access • This layer integrates data from multiple sources – relational databases, business applications, flat files, OData feeds, etc. • Two modes: cached and pass-through • Cached:: pulls in data from all the sources and stores it in a compressed data structure • MOLAP and VertiPaq • Passthrough: pushes query processing and business logic down to the data source • ROLAPand DirectQuery
Analysis Services ‘Denali’ - UDM MDX MDX MDX ? MDX MDX
Analysis Services ‘Denali’ - BISM MDX MDX MDX ? DAX DAX DAX MDX
Delali’s new features in BISM BISM in ‘Denali’ includes: • hierarchies, KPIs, parent-child, drillthrough, perspectives • additional DAX functions (RankX, DistinctCount, GroupBy, Lookup) • security (role-based with Active Directory, column/row based) BISM does not include: • some of the UDM features • scripts, actions, translations, role-playing dimensions • object model • write-back • other • realtime for non-SQL Server data sources • MDX query support for realtime
Demo • BISM and the tabular model
Advantages of BISM • Relatively simple model • Fast response • Flexible • DAX calculations are similar to Excel formulas • More understandable and user-friendly to majority of people • Same model across all scenarios • Easily scale from personal BI to corporate BI • Faster development than in UDM • Prototyping by end-users • Easier changes of model • Reduction of cost in developing the full BI solution
Positioning of BISM • MOLAP is much more complex than PowerPivot, but it offers greater scalability • ROLAP is even more limited, but it scales above 50TB space • PowerPivot models can grow up to 2GB which is the limit set by SharePoint if they want to be shared among others. Otherwise, only the memory is the limit • BISM comes in the middle and fills the space between MOLAP and PowerPivot • For the space way above the 50TB there are new ColumnStore indexes (in the relational engine) 50 TB ROLAP ColumnStore 5 TB Scalability MOLAP 100Gb BISM 2Gb PowerPivot Usability source: Thomas Kejser, SQLCAT
Current Limitations in “Denali” • Two projects for building a BI Semantic Model • Future plan is to integrate these into 1 model • Use Vertipaq as an SSAS storage • Use MDX scripts in tabular projects • DAX queries are not supported in multidimensional projects • and thereby Power Viewer, which uses DAX to retrieve data from the model
Beyond Denali • BI Semantic Model features • Role playing dimensions • Translations • Actions • MDX Scripts • Realtime over Oracle, Teradata, DB2… • Programmability • BISM object model • MDX query support for Realtime • Write back
Wrapup • BISM is not a replacement for UDM • DAX is not a replacement for MDX • Column store databases offering blazing fast performance • Every model has its advantages • BI architects must decide when to apply which model • BISM v2 in not complete, expect changes!
Questions • Mail to albert@qbids.nl