630 likes | 918 Views
Mission Critical BI in an EDW 2.0 world. By Jason Perkins & William O’Shea. About the Presenters. Will O’Shea Data Warehouse Consultant at AMS Systems, currently assigned to the NHS Over 20 years of experience in consulting, focusing on Data Warehousing and Oracle RDBMS.
E N D
Mission Critical BI in an EDW 2.0 world By Jason Perkins & William O’Shea
About the Presenters • Will O’Shea • Data Warehouse Consultant at AMS Systems, currently assigned to the NHS • Over 20 years of experience in consulting, focusing on Data Warehousing and Oracle RDBMS. • Highlights from Career • Worked with Gene Amdahl • Development Lead at Oracle • Oracle Consultant at Blue Cross • DWH Consultant at Pfizer • Data Warehouse consultant at Johnson & Johnson; awarded Innovation award for “Data warehouse in a box”. • Technical Architect at the NHS, awarded Champagne award by Atos Origin for implementing RDM process • Education • MBA from University of Manchester (MBS) • BSc from University of Waterloo, Canada. • Oracle Certified Professional (10g DBA) • Subject matter expertise Financial, Healthcare & Pharmaceutical. Jason B Perkins • Chief Architect on the Secondary Uses Service (SUS) Programme for BT Health • Over 10 years working on some of the world’s largest and most complex Business intelligence and Data warehouses programmes. • Highlights from Career • Lead BI Architect for BT Retail • ODM Consumer Reference Set (CRS) • BT Mobile Data Strategy • National Name & Address Database (NAD) • Solution Architect for (Swift) BT Marketing Data warehouse. • Qualifications • TDWI Certified Intelligence Professional (CBIP) • DAMA Certified Data Management Professional (CDMP) • Subject matter expertise across Health, Retail and Telecoms
Agenda • MCBI - The Business View • Mission Critical Architecture • Mission Critical Method * BREAK * • Mission Critical Principles & Operating Model • Mission Critical Building Blocks • Summary
Business Intelligence? • “In God we trust. Everyone else bring data?” • W. Edwards Deming
Types of BI? • TDWI “Three threes of Performance Dashboards”
Mission Critical BI • Mission Critical BI :- • “Systems that merit mission-critical status are those that affect a range of business processes, and warrant service-level agreements that align the business needs with system performance. Gartner ” • Do not confuse the many other faces and names in BI:- • Real Time / Right Time BI • Real time integration / Data freshness • On Demand BI • High availability BI
Mission Critical BI – Why? Pervasive Business Intelligence • Business 2.0 • Always on • Self service • Joined up - 360 view of the customer. • Available everywhere • BI/DW • no longer a back office function / system. • Cost of entry in most industries. What you do with it remains a competitive differentiator. Operational Decision support Zero Latency Enterprise Globalisation • “Enterprises compete by using up-to-date information to progressively remove delays to the management and execution of its critical business processes. Gartner”
Mission Critical BI – Real World Examples • E-everything – 24x7 • E-Government • Health care monitoring – • Commissioning / Payment for quality / results • Referral to treatment times • Payment for Quality • Telecommunication • Bandwidth management / Mobile Coverage • Order to fulfilment MIS • Retail – Just-in-time inventory
Mission Critical – Challenges Mission Critical BI is not new!So why is it so hard? • “Pace of change” keeps increasing … • Continued Pressure on IT Spend – estimated ~20-30% reduction in 2009/10. • BI / DW keeps evolving – • Many of the original mission statements of BI/DW remain elusive. • Increased demand for integrated information – e.g. unstructured, social media, etc. • Data Explosion – “Data volumes will grow exponentially while CPU capacity will increase only geometrically. Gartner”. • Security of all the information is paramount • BI/DW remains a predominately “build” activity.
Mission Critical – EDW Scale • Number of different views need to be considered when quantifying the challenge ahead. • Varies by industry, type of business and geography.
EDW Architectures Independent Data Mart Virtual Data Warehouse Hub & Spoke Central Data Warehouse
Mission Critical DW Architecture BI Applications Operational Tier Integration Tier Performance Tier Staging Tier Business Process OLTP & ODS Systems Excel XML Business Applications
Mission Critical DW Architecture BI Applications Operational Ad hoc Query Data Extracts Alerts Analytics Dashboards Reporting Web Services Auditing Security Performance Tier Metadata Services Resource Management Consolidation Marts Aggregates OLAP Sandpits Recovery / Restart Integration Tier Problem Resolution Data Quality SCD Manager Fact Loader Adoption Services Survivorship Conforming Error Management Workflow Monitor Staging Tier Job Scheduling Community Management Loader Services Change Data Capture Validation Services Customer Tracking Business Process External MDM Unstructured Excel XML Business Applications
Serviceability Architecture • Maintenance- load/event tracking & reporting • Resilience – • Ability to stop individual parts of the system, restart • Robustness - error tracking, handling & reporting • Automation – lights out / zero touch • Flexibility - meta data/reference data driven • Robustness - error tracking, handling & reporting Operationally ready
Nursery MethodRaison d'être • BI/DW requires an Iterative approach. Mission critical is no different. • New deliveries and changes must:- • Protect core services. • Facilitate “pace of change” • Support re-use • Allow experimentation • Adapt to changing requirements • Involve users • Developed “Nursery” Method in response • Supports front room and back room deliveries • Reduce cycle time. • “Nurseries” (AKA Sandboxes) – user initiated ETL process • Production of Transformation and Load templates
Nursery MethodGrowing a system Requirements Analysis & Design Planning Implementation Nursery Planting the seed Initial Planning Transplant Implementation Evaluation Testing Delivery Everyone, business & developers, learns from both development and use of the system Introduces the ability to act on what has been learned Leaves Nursery when mature, and is transplanted into production – not re-grown.
Nursery MethodThe Growing Stages • Analysis & Design • Integrated Small teams • Design specification • Implementation • Did I mention Integrated Small teams • Elaboration & Implementation specification • Testing • By both business and developers • Delivery • Delivery to users • Evaluation • User feed back • Quality reports • Transplant • Final delivery should match 1.1 somewhat • Initial Planning • High level overall plan • How long are iterations • What deliverables are required • High level requirements • Planning • Integrated Small teams • Detail Iteration plan • Higher level plan for 2 & 3 iteration • Requirements • Requirements for iteration • Should fit within iteration • or get broken into small bits • Start with lowest level
Nursery MethodCreating a Nurturing Environment • First Steps • Initial Plan • Overall objective? • By when? • Define Roles • Assign Roles • Business roles? • User roles? • Supplier roles? • Commitment from those in the roles!! • Define communication • Meetings? • Frequency • Types • Periodic weeding - Scrum • Watering sessions – Stand-ups • others • Roles involved in each • Tight Integration of roles • Documentation from each role – small • Frequency of documentation • Type of documentation • Define outputs from each iteration/phase • Plan for cycle • Roles involved at what stage • Requirement documentation – small • Initial Schedule • Length of iterations • Potential number of iterations Building the Nursery
Nursery MethodCreating a Nurturing Environment • Next Steps • Define system requirements • Number of data suppliers ? • Amount of data? • Number of users? • Size of infrastructure required • Define First few iterations • Cycle 1 • Get data ? • Load data ? • Extract data ? • Distribute data ? • Cycle 2 • Build some validation? • Extract validation outcome? • Cycle 3 • Build in some robustness? Size of Plot Growth cycles
Nursery MethodPrincipals • Focuses on: • Users – Not Processes and tools • Working systems – Not exhaustive documentation • Working together – Not adhering to the contract • Delivering what is wanted – Not following a plan • Adapting to Change – Not Issuing Change Requests • Both the Left side and the Right side must exist, but the emphasis is • On the Left – Not the Right. • Benefits • Cycle time from months to weeks, even days! • Improve quality – leverage “Lessons Learned”, as they happen • Reduce: • Cost • Delivery time • Happy Users !!! • Our Real world examples • Large International pharmaceutical company (delivered in Months not years) • Healthcare Provider (implemented new functionality in days)
Nursery MethodGreenhouses - Sandboxes • What Constitutes a Sandbox • What are the characteristics • How do they need to act & interact • Users’ play areas • Using the “Build Once – Use Many” principal users can • Load new data sets • Create new tables • Create new reports • Play with existing data • Needs Work Flow Management – Key in a Mission Critical system • Isolates the effects of users’ play areas from production • Does Not isolate the data. • User can access production data • Other users can access their data • Mechanism should exist to release into Production – if required • Sandboxes are not Production; but rather a pathway to production • Sandboxes are used as design, not as code
Nursery MethodExploitation – Managing “live” changes • Differentiate between types of changes – one size does not fit all. • Determines how many Cycles it should stay in the Nursery. • Minor Changes to Reports and Semantic Layer • Category 1 – • Changes to pre-canned reports / extracts • Do not require changes to Semantic layer • Category 2 – Deployment to live of new reports created by information analysts. • Category 3 – Simple changes to the Semantic layer. • New Reports • Category 4 – Creation of new reports / extracts. • Changes Impacting semantic layer • Category 5 – • Other changes to the semantic layer. • creation of new derived fields (not to be performed in the universe). • Category 6 – Changes to pre-canned reports / extracts that require changes to semantic layer. • Category 7 – Creation of new semantic later.
Mission Critical Adaptability • “Pace of change” – keeps increasing … • Its all about speed • Speed of change • Speed of information access • “Design for change” – as opposed to “built to last” • Design to: Build Once – Use Many • Enter “Business Rule Management” (BRM) • Process–Business Process Management (BPM). • Rules – Decision logic • Data – Decision variables Process Rules Data
Mission Critical Adaptability • Design for change • Process– • Business Process Management for operational decision support • Process flow or workflow for tactical / strategic decision support • Rules – • Rules Drive the Process • Declarative approach • Business user managed • Descriptive • Data – • Meta/Reference data Enforces the Rules • Thus data Drives the Process • Contextual • Volatile • Flexible Process Rules Data
Mission Critical Adaptability • Examples of rules management …
Operational PrinciplesFlexibility • Users require “flexibility” without the need to re-develop. • Need to be able to Add and/or Modify • Load Process • Application processing • Error processing • Validations • Recipients of Load statistics (DQ, Errors, etc) • Encryption Process • Load and use new data (joined to existing data) • As and when they want to • Without new code !!!
Operational PrinciplesMaintenance • Operational team require the ability to configure and monitor processes. • View ETL progress (real time) • Loads • Load steps • Load Statistics • Reporting and tracking by: • Load • Business Unit • Time • Status • Performance and statistic reporting. • Error tracking & maintenance against Load • Control Loads if needed • Start (automatically & manually) • Hold/Pause all or part of a load(s) • Stop Loads • Restartable (from where needed)
Operational PrinciplesAdministration • Business require Knowledge • System should output meaningful & understood Error messages. • Specific Messages throughout application, so business know the area. • Visibility of Operations Error maintenance. • Ability to feed into process • Statistical Real-time reporting & tracking of loads. • Know what data has been loaded • Know how much data has been loaded • Know what stage each load is at. • Know what business units have loaded data.
Operational PrinciplesResilience • Business & Operations requireA robust & resilient system • Loads may be automatically restarted from where they were stopped/failed (as required) • Each load job, step and statistic has start/end times and status • ETL checks status of job to determine if it needs to/can be run. • Fatal errors need manual intervention before they may be rerun. • Performance and statistic reporting • Self initiating Loads
Operational PrinciplesSummary • Data Warehouses require “Metadata Driven Processing” (MDP) • What can be MDP and what can’t? • Loading Data – Types of loads, Source to target • Load Control – Starting, stopping, branching, etc • Errors & Messages – effects of & reporting on, • Validation (DQ) – how, what, when & reports • Encryption – how, what & when • Reverence Data Processing • How? • Where can MDP help your DWH? • What Metadata does MDP need? • Feed MDP into Development stream? • Educate developers to use it • Educate user to request it. • Educate the business to use it.
Metadata Driven ProcessingEnterprise Warehouse Operational Components (EWOC)The Concept Business Unit Instance of Job Project Severity Validation Outcome Load Step Work-Flow Message Validation Rules Data Integration & Quality Team? Load Statistics • Application • Users • Admin
Metadata Driven Processing The Metadata Driven ETL • Infrastructure • Storage Allocation • CPU Allocation • Memory Allocation • Sand Pit Business Unit • Severity • Fatal • Error • Warning • Information • Job • Collection of Steps • Has a start and an End • … Schemas Project • Validation • Lookups • Static values • Data Quality • Patterns • Linkage • Man/Ops • Etc • Message • Validation • Load • Processing • Job Step • Get data • Load staging • Load Atomic • Human Interaction • Etc. BU Job Cause & Solution • Source • SUS • Cancer Registry • Internal • Target • Internal • BO / OBI BU Job Step • BU • Validation • Additional • Less the non-mandatory Schema Source Schema Target • Type • CSV File • XML • Table • Report/Extract
Metadata Driven Processing The Jobs - ETL • Metadata Driven Processing (MDP) • Definition of Jobs • Loads are specific instances of a Job • Build re-usable modules • Metadata driven code, promote MDP • Quicker time to delivery, develop and test once • Add/Change source and target by changing MDP data • Add/Change ETL by changing MDP data Job • Validation • Lookups • Static values • Data Quality • Patterns • Linkage • Man/Ops • Etc Job Step Source Target • Pick Lists • Defined by Reference data • Examples: • Date range validation • Foreign Key Lookups • Mandatory / Optional • dd-mm-yyyy vs. yyyy/mm/dd • Y/N vs. 1/0 • Type • CSV File • XML • Table • Report/Extract
Metadata Driven Processing The Messages – Driving force • Fatal – Fails the load • Invalid file format • Error – Load keeps going • Max number of errors? • % of load rather than # • Warning – not following rules • Date format etc. • Information – no affect on load • Dates out of range • Visit after treatment • Supports MDP • Feeds Metadata Driven ETL • Should be used throughout ETL • Failure Checks/Traps • Exceptions • Reporting (DQ & Validation) • Each error/trap/exception has a unique Message ID • Headings/Titles/Text • Severity can be changed • Changes processing when changed • Severity • Fatal • Error • Warning • Information Message Grouping • Message • Validation • Load • Processing • Usage • Error reporting • Textual objects • Information Messages • Load Reporting • Load Control Cause & Solution • Helps with future occurrences • Updated & Maintained
Metadata Driven Processing Data Quality & Linkage • Supports MDP • Key in any system, but more so in a MC one. • Use Metadata to Drive process • Important right people get right data • Quickly • Rules Based Validation • Data Quality Validation • Linkage Validation • New rules can be added/removed • When needed(no code required) • Businesses users decide to add rules • From pick list • Defined using building blocks • Severity of failure of rule can be changed • When needed(no code required) • Businesses users decide severity • Business • Unit • Canadian Office • Finish Office • UK Office Audit Data Reports Metadata Driven ETL Validation Outcome Validation Rules Lookups Static values Range Conversions Patterns Linkage Man/Ops Etc Reports Data Integration & Quality Team?
Metadata Driven Processing Encryption • Supports MDP • Encryption is simply a specific Instance of a Job • Built to perform Encryption • New Encryption Types can be added but do require code • New columns to be encrypted can be added by simply adding metadata, no code. • Keys can be stored or added at run-time • AES128 • Triple DES • Look-up • Home-Grown? • Name • DoB • ID # Encryption Type Audit Data Source Data Column Type Parameters (keys) Metadata Driven ETL Target Data Source & Target Definition
Metadata Driven Processing Reference Data Management • Supports MDP • New reference data can be added without new code • Different BUs can have different data but though same RDMT • Different Import types are catered for • Different Table Types are catered for e.g. K-Type 1, 2 & 3, Home grown, etc. e.g. CSV, XML, Excel Table Types Business Unit Import Types Reference Table Definitions BU Sources Metadata Driven ETL Column Definitions Source Attribute Definitions Source Definitions Target Data Audit Data Source Data
Metadata Driven Processing Extensibility • Extending the Mission Critical Data Warehouse. • Most BI/DW requirements are not green field. • Extending existing is a key design objective. • Build Once – Use Many • Adding new data sources • Change existing data sources • Data linage - Metadata • Where data has come from • Where it has gone • What has happened to it along the way • Impact Analysis • New exploitation (analysis and reporting) of existing DW • Adding new exploitation capabilities to DW Audit Data
Technology Drivers • Examples of technology features supporting Mission Critical BI. • Analytics outside Data warehouse • BI Web Services • High Availability Data Warehousing • Real-Time Data Warehousing • Master Data Management (MDM) • From “TDWI Best Practice Report, Next Generation Data Warehouse Platforms, By Philip Russom”.
Mission Critical Performance • Leaving the Nursery (or Sandbox) • Productionise the code • Performance!! • Balance • Brute force – • MPP (medium to high volumes / complexity / users) • SMP (low volume / complexity / users) • Performance Layer • BI tool and RDBMS calibration • Speed of ETL vs. Need of Retrieval - when to do something and when to not. • 80 – 20 rule • Selective Denormalisation • Selective Pre-Joins • Aggregates and Summaries – are they always needed DWA no?, SMP yes? • OLAP • Performance metadata • Row counts • Elapsed time
Mission Critical Administration • Resources Management • Not all BI is mission critical – phew! • Prioritise resources for Mission Critical • BI Applications • Back office workload
Information Lifecycle Management • Not all information is mission critical – phew! • Many benefits to segmenting information by its usefulness to the business. • Performance / Throughput • Cost effective • Prioritisation of resources • ILM - Number of levels • Separate active and non active data. • Compression non volatile data • Read only for historic • ILM - Intelligent storage based on usage of information. • Automation is a key (emerging) requirement for supporting MCBI.