1 / 22

Data Mashups Defined and the Differences from Traditional Data Integration Approaches

Data Mashups Defined and the Differences from Traditional Data Integration Approaches. Byron Igoe Product Manager InetSoft Technology. for the Minnesota Chapter of The Data Management Association. Presentation Outline. Traditional Data Integration ETL & EII Spreadmarts

giulia
Download Presentation

Data Mashups Defined and the Differences from Traditional Data Integration Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mashups Defined and the Differences from Traditional Data Integration Approaches Byron Igoe Product Manager InetSoft Technology for the Minnesota Chapter of The Data Management Association

  2. Presentation Outline • Traditional Data Integration • ETL & EII • Spreadmarts • Meaning and Origins of Data Mashup • In-Memory Data Federation • Combining Formal and Informal Data Sources • Differences from Traditional Techniques • Data Management and Data Mashup • Data Warehousing • Meta Data • Data Governance • Enterprise Content Management • Data Modeling

  3. Traditional Data Integration: ETL • Extract, Transform and Load • a well-understood convention for preparing data for analysis • reasons for being: • reorganization • conversion • cleansing • mapping • pre-calculations of business metrics • transformations • aggregations • save processing resources during analyses • ensure data quality

  4. ETL (continued) • Data warehousing trends • growth in number of data sources • range of 3 to 30 “official” data sources currently • users desire to use data sources discovered via the Web • using reports or feeds from vendors & partners • growth in data¹ • Annual global data production: 5 exabytes • 5,000,000,000,000,000,000 – 18 zeroes • Equivalent of 37K US Libraries of Congress • Almost 1 GB per person on earth • Growing at 30% per year • 1 zetabyte by 2010 – 21 zeroes • what are the data sizes and growth rates at your enterprise? 4 ¹Source: UC Berkeley study, 2003

  5. ETL (continued) • Limitations and challenges of traditional ETL & data warehousing • cumbersome to add data sources • bottleneck for ever increasing user demands • overkill for some data sources, especially transient ones • rigidity of business metric definitions • inflexibility to process changes • lag in data availability 5

  6. Traditional Data Integration: EII • Enterprise Information Integration • same principle as ETL, creating a single data source from many • arose from data warehouse’s limitation of data timeliness • difference from data warehousing: a virtual data warehouse • benefits: • data is "real-time" • more adaptable to changes in definitions/processes • limitations: • bottlenecks and slow turnaround time to incorporate changes to definitions and processes • still relies on IT efforts to respond to demands 6

  7. Spreadmarts • The “bane” of the business intelligence specialist! • the use of spreadsheets to store copies of enterprise data • arose from users’ frustrations with • lack of any business intelligence front-end application, or • too-hard-to-use versions of early (and some current) applications • graphical charting limitations of a BI app • tedious change request form processes • slow turnaround times to change requests • not having a way to bring in external data 7

  8. Spreadmarts (continued) • The current position in business intelligence • now BI vendors and enterprises are learning to accept the spreadsheet as a very user-friendly tool • but still aim to reign in the use of spreadmarts per se because they are: • error prone • institutionalizing labor inefficiency • can become corrupted • have data size limitations • are not ideal for sharing • knowledge is “locked up” • don’t have governance controls • violate Sarbanes-Oxley requirements • in search of the “right” solution 8

  9. Meaning and Origins of Data Mashup • A mashup is “the creation of a new work from two sources that were not initially designed to be combined" • first used in music in the early ’00’s, especially rap music • next used in Web 2.0 environment, especially Web portals, like My Yahoo • next entered enterprise application space, limited to “screen scraping” • now we define “data mashup” as “data transformation and integration that can be done by users with minimal skills” • examples: • joining two datasets that weren’t previously combined • creating a new business metric on the fly • importing external or user-created data 9

  10. The Differences from Traditional Techniques • it’s the middle ground between "IT controlled" and "User defined“ • “collaboration" is born • in the traditional models, IT defines how multiple sources are connected • painstaking process; especially for mergers, process changes, etc. • with data mashup, the connections are created on the fly 10

  11. The Business Case Benefits of Mashups • Higher ROI on BI investment • higher success rate of deployment due to higher: • end-user satisfaction • usage rates • adoption rates • greater number of actionable learnings leading to: • more sales and/or • greater efficiency • increased speed of: • decisions • competitive responses • reactions to customer feedback 11

  12. The Business Case Benefits of Mashups • Lower TCO • reduced personnel needed to support a BI solution • end-user self-service • save on change request processes • save on manpower to code requests • reduce report request backlog • reduced number of highly-skilled analysts or DBAs needed to satisfy business demands • end-users meet their own needs more often 12

  13. The Advent of In-Memory Data Federation • Moore’s law, increasing power, lower costs of CPU & memory allow in-memory transformation, pre-aggregation and caching • Enables data mashup as well 13

  14. The Trade-offs of these Techniques 14

  15. Combining Formal and Informal Data Sources • how a data mashup works • similar to what a user is doing in Excel • creating new formulas • bringing in external data • doing what-if scenarios • live connections to the enterprise sources are maintained • data mashup "refreshes" automatically on each use • can save it to a shared folder for re-use and collaboration 15

  16. Data Management and Data Mashup Relative to Data Warehousing • data mashups can be seen as an expedient alternative to data warehousing is some cases • data mashup can be a precursor to data warehousing • allows quick and inexpensive experimentation • when satisfied, codify the mashup into a data warehouse for performance benefits 16

  17. Data Management and Data Mashup Relative to Impact on Pre-Aggregation • pre-aggregation improves downstream processing • with many traditional techniques: • pre-aggregations are designed before reports and dashboards • usage of pre-aggregated data is explicit • in the data mashup model, pre-aggregation can be built into the engine 17

  18. Data Management and Data Mashup Importance of Meta Data • creation of mashups depend on meta data: data type compatibility • transformation options, like grouping and aggregation, differ based on the field type 18

  19. Data Management and Data Mashup • Relative to Data Governance • data mashups are a major improvement over spreadmarts • data quality is enhanced • live data is used • no copying & pasting • changes to master data mappings take effect immediately • data security is enhanced • security defined at source system level • all derived mashups automatically secured • overcome limitations of Excel’s security • concern: is it giving too much power to users? • no different than what users will do inevitably in Excel 19

  20. Data Management and Data Mashup Relative to Enterprise Content Management • data mashups are re-usable & shareable • data integrity is always maintained • more easily embedded in other applications, portals 20

  21. Data Management and Data Mashup Relative to Data Modeling • data mashups situated on top of various data sources • data mashups can use: • physical tables • pre-defined SQL, or • logical models 21

  22. Questions and Discussion 22

More Related