620 likes | 959 Views
An Agile Approach to Building & Managing Data Warehouses A Briefing by WhereScape. Mary Edie Meredith , Sr. Technical Analyst - maryedie@wherescape.com. Why do Data Warehouse Projects struggle ?. Inaccurate business requirements - #1 problem IDC Poor development productivity
E N D
An Agile Approach to Building & Managing Data Warehouses A Briefing by WhereScape Mary Edie Meredith, Sr. Technical Analyst - maryedie@wherescape.com
Why do Data Warehouse Projects struggle ? • Inaccurate business requirements - #1 problem IDC • Poor development productivity • Slow development cycles • High cost of resources • High TCO • Poor documentation – usually the last thing that is considered & never up to date. • Poor data quality • HIGH RISK Gartner notes that over 50% of data warehouse projects fail or go wildly over budget
Where did they go wrong? – one real problem is the “Big Bang” project approach “Incremental Data Warehouse Development – The Only Way to Fly” Bill Inmon, Jan 8, 2009, (BeyeNetwork) • “There are many reasons the ‘Big Bang’ approach doesn’t work … “but at the heart is inability of the development analyst to gather requirements in the manner prescribed by the SDLC” • “End users of analytical systems need to know what the possibilities are before they can articulate the requirements.” The goal is NOT to build a Data Warehouse, but rather… • Deliver real value • Create a solution that is adaptable because responding quickly to change brings competitive advantage • Create a process to develop and maintain the solution that is trustworthy and sustainable
How would agile proponents approach the problem? From the agile manifesto: //agile • Early, frequent, and continuous test and delivery of valuableworking software (every 2 wks-2mos). • Welcome changing requirements, even late in development. • Business people, developers work together daily throughout the project. • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done. • The most efficient, effective method of conveying information to and within a development team is face-to-face conversation. • Continuous attention to technical excellence and good design enhances agility. • Simplicity--the art of maximizing the amount of work not done--is essential. • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.
What is uncomfortable about this approach? • The further out in time, the less a project team can say about what will be accomplished. • An agile approach can break the rules. • Agile implementers sometimes wrongly assume you can break ANY rule. • Shortcuts do not equal Quality Pragmatism • Classic trade-offs for project managers - Schedule/ Scope/ Resources/ Quality – agile leaves little wiggle room. • Does not lend itself to outsourcing, distributed teams. • Having a close working relationship with business users does not solve the difficulty determining requirements. And ….
If I could deliver something meaningful in weeks DON’T YOU THINK I WOULD HAVE, ALREADY.
What really works using agile “The WhereScape Way” • A Governance structure • Strategy, Architecture, Roadmap, Standards • Goals, sponsors, infrastructure, data governance …. • New Development Paradigm for delivering data - RED • ETL tools are great for moving data, but RED can do DW part better. • Integrated Development using one metadata driven tool. • Do the data delivery in the database. • Incorporate Business Rules into data delivery process • Iterative workshops with business users • Use REAL DATA for flushing out requirements (RED enables this) • Track all issues discovered, especially data quality
Agile in Operation Business User Sessions Live Data Workshop • Integrate analysis, design, creation, data delivery, deployment, iteration • Useful even if you just need to provide the presentation layer • Feedback from business users on live data part of the development process
Speeding up the development by leveraging metadata, embedding best practice methods dim_customer_key dss_update_time
Star schema creation scenario – start with load table Source Warehouse R R R R Native RDBMS, ODBC accessible, Files Oracle, SQL/Server, Teradata, DB2
RED Browser Mode Actions Drag and Drop Target Area Choose connection and filtering Metadata Browsing Connections Results
Star schema creation scenario – start with load table Source Warehouse R R R R Native RDBMS, ODBC accessible, Files Oracle, SQL/Server, Teradata, DB2
Drag and Drop Example: load table “create and load” metadata
Drag and Drop Example: load table results create generated load script execution
Drag and Drop Example: load table results Display Data create generated load script execution
Stage table creation scenario – the stage table Source Warehouse Foreign dimension Keys, lookups R R R Source table join R
Add columns from load_order_line (Drag and Drop) Load_order_header Column metadata
Add columns from load_order_line (Drag and Drop) prevents duplicate column names Load_order_header Column metadata
Add FK cols to Stage Table – Drag and Drop dim_* Drag and drop Dimension table keys
Column Transformations – Business Rules, Computed Fields, String Manipulation, Type Conversion, Null handling,…
… then specify the Join statement Numerous joins supported add appropriate clauses
…indicate the business key to identify SK in DimensionPrompts if column names match
Fact table creation scenario – Sales Fact table Source Warehouse R R R R
Metadata leveraged to create the code Dimension tables are created with “zero” row for unknowns Transformation for quantity column Join metadata
Auto generated stored procedure code … • Keeps all the data movement in the database • Provides consistent variable naming, coding best practices • Utilizes custom parameters you can embed in metadata • Includes error checking and rollbacks • Preserves the metadata for easy modification • Can augment with custom procedures • Includes features best practices for various object types • Can handle slowly changing dimensions (all three types) • Procedure provided to populate and update time dimension • Handles code for surrogate keys, update and life-span dates • Creates Unknown Row for each dimension table • Accounts for missing dimension key matches in source data Let’s advance developers can skip the mundane Allows less experienced developers to be productive
Next Step – Business User review • Easy vehicles to show this to Business users: • Output table data to Excel • Stress test with SSAS cube
Create a SSAS Cube for Business User Eval Drag and Drop Fact to OLAP Cube target Creates OLAP dimensions Creates OLAP measure group
Create a SSAS Cube for Business User Eval Slice and Dice in Analysis Services
Scheduler to manage objects and data flow Run in parallel
Scheduler to manage objects and data flow Run in parallel