230 likes | 367 Views
Data Mashups Defined and the Differences from Traditional Data Integration Approaches. Byron Igoe Product Manager InetSoft Technology. for the Minnesota Chapter of The Data Management Association. Presentation Outline. Traditional Data Integration ETL & EII Spreadmarts
E N D
Data Mashups Defined and the Differences from Traditional Data Integration Approaches Byron Igoe Product Manager InetSoft Technology for the Minnesota Chapter of The Data Management Association
Presentation Outline • Traditional Data Integration • ETL & EII • Spreadmarts • Meaning and Origins of Data Mashup • In-Memory Data Federation • Combining Formal and Informal Data Sources • Differences from Traditional Techniques • Data Management and Data Mashup • Data Warehousing • Meta Data • Data Governance • Enterprise Content Management • Data Modeling
Traditional Data Integration: ETL • Extract, Transform and Load • a well-understood convention for preparing data for analysis • reasons for being: • reorganization • conversion • cleansing • mapping • pre-calculations of business metrics • transformations • aggregations • save processing resources during analyses • ensure data quality
ETL (continued) • Data warehousing trends • growth in number of data sources • range of 3 to 30 “official” data sources currently • users desire to use data sources discovered via the Web • using reports or feeds from vendors & partners • growth in data¹ • Annual global data production: 5 exabytes • 5,000,000,000,000,000,000 – 18 zeroes • Equivalent of 37K US Libraries of Congress • Almost 1 GB per person on earth • Growing at 30% per year • 1 zetabyte by 2010 – 21 zeroes • what are the data sizes and growth rates at your enterprise? 4 ¹Source: UC Berkeley study, 2003
ETL (continued) • Limitations and challenges of traditional ETL & data warehousing • cumbersome to add data sources • bottleneck for ever increasing user demands • overkill for some data sources, especially transient ones • rigidity of business metric definitions • inflexibility to process changes • lag in data availability 5
Traditional Data Integration: EII • Enterprise Information Integration • same principle as ETL, creating a single data source from many • arose from data warehouse’s limitation of data timeliness • difference from data warehousing: a virtual data warehouse • benefits: • data is "real-time" • more adaptable to changes in definitions/processes • limitations: • bottlenecks and slow turnaround time to incorporate changes to definitions and processes • still relies on IT efforts to respond to demands 6
Spreadmarts • The “bane” of the business intelligence specialist! • the use of spreadsheets to store copies of enterprise data • arose from users’ frustrations with • lack of any business intelligence front-end application, or • too-hard-to-use versions of early (and some current) applications • graphical charting limitations of a BI app • tedious change request form processes • slow turnaround times to change requests • not having a way to bring in external data 7
Spreadmarts (continued) • The current position in business intelligence • now BI vendors and enterprises are learning to accept the spreadsheet as a very user-friendly tool • but still aim to reign in the use of spreadmarts per se because they are: • error prone • institutionalizing labor inefficiency • can become corrupted • have data size limitations • are not ideal for sharing • knowledge is “locked up” • don’t have governance controls • violate Sarbanes-Oxley requirements • in search of the “right” solution 8
Meaning and Origins of Data Mashup • A mashup is “the creation of a new work from two sources that were not initially designed to be combined" • first used in music in the early ’00’s, especially rap music • next used in Web 2.0 environment, especially Web portals, like My Yahoo • next entered enterprise application space, limited to “screen scraping” • now we define “data mashup” as “data transformation and integration that can be done by users with minimal skills” • examples: • joining two datasets that weren’t previously combined • creating a new business metric on the fly • importing external or user-created data 9
The Differences from Traditional Techniques • it’s the middle ground between "IT controlled" and "User defined“ • “collaboration" is born • in the traditional models, IT defines how multiple sources are connected • painstaking process; especially for mergers, process changes, etc. • with data mashup, the connections are created on the fly 10
The Business Case Benefits of Mashups • Higher ROI on BI investment • higher success rate of deployment due to higher: • end-user satisfaction • usage rates • adoption rates • greater number of actionable learnings leading to: • more sales and/or • greater efficiency • increased speed of: • decisions • competitive responses • reactions to customer feedback 11
The Business Case Benefits of Mashups • Lower TCO • reduced personnel needed to support a BI solution • end-user self-service • save on change request processes • save on manpower to code requests • reduce report request backlog • reduced number of highly-skilled analysts or DBAs needed to satisfy business demands • end-users meet their own needs more often 12
The Advent of In-Memory Data Federation • Moore’s law, increasing power, lower costs of CPU & memory allow in-memory transformation, pre-aggregation and caching • Enables data mashup as well 13
Combining Formal and Informal Data Sources • how a data mashup works • similar to what a user is doing in Excel • creating new formulas • bringing in external data • doing what-if scenarios • live connections to the enterprise sources are maintained • data mashup "refreshes" automatically on each use • can save it to a shared folder for re-use and collaboration 15
Data Management and Data Mashup Relative to Data Warehousing • data mashups can be seen as an expedient alternative to data warehousing is some cases • data mashup can be a precursor to data warehousing • allows quick and inexpensive experimentation • when satisfied, codify the mashup into a data warehouse for performance benefits 16
Data Management and Data Mashup Relative to Impact on Pre-Aggregation • pre-aggregation improves downstream processing • with many traditional techniques: • pre-aggregations are designed before reports and dashboards • usage of pre-aggregated data is explicit • in the data mashup model, pre-aggregation can be built into the engine 17
Data Management and Data Mashup Importance of Meta Data • creation of mashups depend on meta data: data type compatibility • transformation options, like grouping and aggregation, differ based on the field type 18
Data Management and Data Mashup • Relative to Data Governance • data mashups are a major improvement over spreadmarts • data quality is enhanced • live data is used • no copying & pasting • changes to master data mappings take effect immediately • data security is enhanced • security defined at source system level • all derived mashups automatically secured • overcome limitations of Excel’s security • concern: is it giving too much power to users? • no different than what users will do inevitably in Excel 19
Data Management and Data Mashup Relative to Enterprise Content Management • data mashups are re-usable & shareable • data integrity is always maintained • more easily embedded in other applications, portals 20
Data Management and Data Mashup Relative to Data Modeling • data mashups situated on top of various data sources • data mashups can use: • physical tables • pre-defined SQL, or • logical models 21