510 likes | 523 Views
This presentation discusses the key industry challenges in data warehousing and business intelligence (BI) and explores the specific challenges faced by the DoD and federal government. It includes a case study of integrated data architecture solutions and provides modeling tips. The presentation also outlines SI International's top 10 key ingredients for success in data warehousing and BI for the federal government.
E N D
Overcoming Challenges to Implementing DW and BI Applications in DoD & the Federal Government Tom McCullough DAMA-NCR Presentastion 13 May 2003
Tom McCullough, CCPSr. Enterprise Data Architect, SI International, Inc. • Quick resume: • B.S. Mathematics, university of Pittsburgh, Summa cum Laude, Phi Beta Kappa • MBA (concentrations: finance & CIS), University of Rochester, Wm. E. Simon Graduate School of Business Administration • Eastman Kodak co., Management Services Division, Kodak Park, Rochester, NY: systems designer, systems analyst, project manager, DBA, chief corporate manufacturing software package analyst • Part time adjunct instructor State University of New York at Brockport, Dept of Comp Sci • The Riggs National Bank of Washington, D.C. Lead data base analyst • Emhart/ATI, Baltimore, Md.: Principal consultant, Data Administration • CSX Technology: Technical Manager and consultant to: CSX Transportation, Inc.; Westinghouse ESG (now Northrop Grumman); T. Rowe Price Investment Technologies • Zurich financial services inc: VP of business intelligence & data warehousing • SI Consulting, Inc./A subsidiary of SI international, Inc. Reston, Va.: Sr. Enterprise Data Architect. Client list includes: • US Dept of State, Bureau of Consular Affairs, Consular Systems Division • US Dept of Interior, National Park Service • US Dept of Energy • US Army National Guard, National Guard Bureau
What We’ll Cover Today • What are some key industry challenges today in data warehousing and BI? • What challenges are specific to the federal government space? • SI International’s experience delivering integrated data architecture solutions: A case study. • Modeling tips from SI International, Inc. • SI International’s top 10 key ingredients to achieve success in data warehousing and BI for Uncle Sam.
What Are Some of Today’s Key Industry Challenges Today in Data Warehousing and BI? ---
How Risky Are Data Warehouse Projects? • Only15% of companies in a 2002 cutter survey of 142 companies call their data warehousing efforts to date a major success. • Data warehouse systems are complex and expensive. • 60% to 90% of all data warehouse projects either fail to meet expectationsor areabandoned.
How Risky Are Data Warehouse Projects? • Technology remains part of the problem: only 27% of survey respondents were confident about the current state of data warehouse technology, while 58% were ‘cautious.’ • Stovepipe data marts and web silos!! • Dealing with users with clout who must have their “cubes” by tomorrow or else!! • Source: InformationWeek, January 9, 2003, “data warehouses are still high risk.”
Why Have There Been So Many Failures?Take a Look at Just a Few… • Failing to manage user expectations. • “The greatest cause of disappointment is expectations.” - Benjamin Franklin. • Poor data quality in the source systems or an entire lack thereof. • Over focus on the technology; Under focus on the business problem. • Lack of aligning objectives with the business’s strategy. • A lack of understanding of fundamental temporal relational modeling. • Normalization requires knowledge of the inherent structure of the data. • Attempting to model the whole enterprise first before delivering.
Why Have There Been So Many Failures?Take a Look at Just a Few…(cont’d) • Lack of sufficient sponsorship at the enterprise level. • Lack of adequate training and education in the specialized tools. • Inadequate data stewardship. • Not tied to management by objectives. • No incentives in place to foster it. • Attempting to manage decision support projects like operational transaction processing system projects. • Inadequate attention to user security concerns. • Failing to get the source system subject data experts recognized and fully on board.
Why Were DW’s Created in the First Place? • Long before data warehouses were popularized by Bill Inmon, there was a concept known as the corporate information center. • Technical “chauffeurs” existed to get at the data from upper management. • Attempts by new management to make an “end run” around the unwieldy legacy stovepipe systems. • Trying to address the needs of true ad hoc users.
What Are the Practical Reasons for Why You Need a Data Warehouse? • It functions as • A repository for historical snapshots to support strategic analysis and decision support • A flexible source of data for use by mining, reporting and OLAP tools • The ultimate detailed historical data source for drill downs • It eliminates • Massively redundant data extractions • Conflicting and confusing technical meta data • The need for redundant, uncoordinated data cleansing processes • Source: Claudia Imhoff, “building a BI environment on a shoestring” presentation to Dama international 2003 symposium
Where Do the ODS & DW Fit into The “Information Supply Chain”?
What Is the Principal Technical Challenge?The Extract, Transformation, Load, & Metadata Problem Modeling Tool • Warehouse • Model Metadata Repository & Transfer Mechanism (2) Source Definitions Physical Model Logical Model Source Defs. Source- Target Maps Load Stats (4) Source to Target Maps (3) Table Definitions (5) Map & Transform Info Data Staging Tool Source Systems (8) Load Stats (6) Extracted Data (5a) Physical Info (table spaces, etc.) (7) Transformed Data Data Warehouse
Is there a difference between Inmon’s normalized data warehouse and Kimball’s “staging area”? • Challenge: how to get the data into one place where it’s: • Consolidated from various sources, • Arriving at different times. • Kept in abeyance before updating production data and publishing it to users. • Reality: if the data warehouse = the staging area, there will be a need for a pre-staging area. 9-May-03 Proprietary and Confidential 7
What Are the Benefits of Dimensional Modeling? • Star schema’s are easy for users to understand. • Measures. • Dimensions. • What’s not in the model is obvious. • The process facilitates requirements validation. • The process is a good way to quickly formulate a preliminary target data model.
What Are Some Problems with Dimensional Modeling? • The focus of a pure star schema (ie., Dimensional data model) is primarily on user requirements and ease of data access, not data update nor the inherent structure of the data. • Fabian will have more to say. Stay tuned.
This Has Given Rise to the Great Debate • Bill Inmon (father of the data warehouse). • Advocates normalized enterprise data warehouse. • Time variant data structures. • Don’t be concerned about requirements too much up front. • “Build it and they will come.” • Ralph Kimball (dimensional modeling): • Start with clearly defined user requirements. • Build a subject area at a time based on a “star schema.” • The data warehouse is the union of all the data marts but only if the dimensions conform across all the fact tables. • Before loading the facts and dimensions, first concentrate on the “staging area.”
What Other More Fundamental Issues Are at Stake? • Tension between operational databases and warehouses. • Dr. Joseph firestone coined the more fundamental problem of “dynamic integration”: • Data sources and data targets must be identified, designed, and maintained. • Both the source and target designs are moving targets thus complicating the process of configuration management. • Get the source system support people and their users on your side when implementing and maintaining any DW solution. • Data warehousing is an attempt at data integration.
What Are the Sociological Challenges to Achieving Integration? • Data integration requires business integration. • Political and “Ricebowl” obstacles to overcome: • “Attempts to integrate will expose inconsistencies/incompatibilities that create human insecurities and power issues.” • Source: Information Management Directions: The Integration Challenge, Section 5.7.7 on “sociological implications”, NIST Special Publication 500-167. • Every database represents a budget. • Every action will create an equal and opposite reaction. • Incumbent contractors / obfuscation hurdles.
How Are Enterprise Application Portals Affecting the Industry? • Portals make data staleness painfully obvious and more visible to all. • Portals are driving organizations to respond more quickly to information, but… • Processing and analyzing data in near real time are still recognized as the holy grail of enterprise data management, yet are still difficult to achieve. • This has also given rise to the notion of web services.
Are “Web Services” a True Panacea or Simply the Latest Silver Bullet? • Beware: numerous vendors and authors are promising web services as the ultimate answer to real-time data analysis. • The “fly in the ointment” remains the challenges of the working out the semantics of the underlying data elements. • Attend Fabian’s presentation tomorrow.
What About the Debates about ETL vs. EAI vs. EII to Achieving Integration? • High volumes argue in favor of ETL tools. • EAI tools work better with low volumes. • Some ETL vendors have more data cleansing and transformation technology in their products and pull from wide variety of sources (ex., Data junction). • Enterprise Application Integration is more about integrating processes (ex., Biztalk). • Enterprise Information Integration tools only move data when the application or user requests it.
What Challenges Are Specific to the Federal Government Space? ---
What Are Uncle Sam’s DW and BI Challenges? • Achieving stated but elusive goals: • Government Performance and Results Act of 1993 • Clinger/Cohen act of 1996 • HIPAA • Internal regulations, policy & procedure memos • Available funding • Contract vehicles: contractors must overcome legal hurdles to get into the game to begin with
AR-25-1 (information management) AR 380-19 (information systems security) AR 380-5 (army information security program) AR 380-53 (information systems security monitoring) AR 71-9 (materiel requirements) AR CADM (army core architecture data model) Army enterprise architecture guidance document (AEAGD) Army enterprise vision (20 July 1993) CMM (capabilities maturity model) Computer security act of 1987 DA Pam 26-6-1 (army acquisition planning for information systems) DA Pam 73-7 (software test and evaluation guidelines) DII COE (release 3.3) security features users guide (SFUG) for windows NT 4.0 DII COE (release 3.3) security functional designers guide (SFDG) for HP-UX 10.20 DII COE (release 3.3) security functional designers guide (SFDG) for Solaris 2.5.1 manual) DII COE (release 3.3) security functional designers guide (SFDG) for windows NT 4.0 DII COE configuration management plan DII COE user interface specifications v 4.0 DoD 5200.28 (security requirements for automated information SYSTEMS (AISS) 21 mar 88) DoD 8320.1-DOD data administration DoD 8320.1-M data administration procedures DoD 8320.1-m-1 (data standardization procedures) DoD 8510.1-M (department of defense information technology security certification And accreditation process (DITSCAP) application DoD 8910.1-M (DoD procedures for management of information requirements) FIPS 184 (integration definition for information modeling (Idef1x)) IEEE 12207 (standard for information technology) IEEE std 829-1983 (standard for testing documentation) Implementing DOD standard data elements JTA (joint technical architecture) JTA-A (joint technical architecture-army V6.0) Mil-HNBK-61 (configuration management guidance) Army Regulations, Directives and Requirements
DoD’s Information Management Challenges • Literally thousands of stove-piped systems. • Existing heterogeneous infrastructure with which to integrate. • Data Mgmt Driver: The the net-centric environment movement toward “content” that users demand. • From a recent RFP: “For effective “content,” there must be an approach to make data visible, accessible, trusted, understandable, and interoperable.” • Source: Solicitation DCA100-03-T-4029.
The ARNG ODS and DW A Case Study Tom McCullough
Current ARNG Information Technology Environment 16 Databases 10 Platforms 54 States & Territories 140+ Applications 6,000 DA Civilians 24,700 Military Technicians 352,000 Total 23,000 AGR 12 Languages
Application Access AKO Entry Point Application Specific Web pages GuardNet Web Entry Point Access to Application Specific Web pages and Applications Administered by Application Administrator Access to GuardNet Web Administered by ARNG Guardnet Staff Access to AKO Administered by AKO Staff
AIS/CIO Review IT RCB IT RCB • Into CM • Assign PM • Assign Work Order Functional has an Idea All Final Docs Functionally Technically Security BPR CAPR APB Wrksht Signed SRS DM, DD Models IT ROI Source Code Final DM Final DD ITPD ITROIRE PMP Test Plan Test Scripts Test Cases ASEIG Review Analysis Alpha Test Design & Code Accredit Release and turn over to Maintenance NGB-AIS-AP Development Process Beta Test Legend: BPR – Business Process Re-engineering PMP – Project Management Plan CM – Configuration Management PM – Project Manager SRS – Software Requirements Specification DM – Data Model DD – Data Dictionary ITPD – Information Technology Project Description ITROIRE – Information Technology Return on Investment Rationale Estimate
What Is SI International's Approach to Delivering Integrated Enterprise Information Asset Managementfor the US ARNG • Use and adaptation of SI’s data warehousing methodology • What Effect Does the Web Have on the DW? • High Level Components of the ARNG DW Architecture • DW and ODS Tips from SI International’s Consulting Practice • General Modeling Tips from SI International’s Consulting Practice • Lessons Learned: Top 10 Ingredients for Success in DW and BI
National Guard Bureau Web Enabled Data Warehouse Vision Operational Equipment Additional DMOSQ Funding Budgetary Readiness Readiness Areas Supply Extract Extract Extract Extract Extract Extract STAGING AREA Preparation and Transformation Prepare PREPARED DATA Load Logical Data Model Operational MetaData (Technical) Integrate ENTERPRISE DATA WAREHOUSE Informational MetaData (Business) Drill Through Load Operational MetaData Publish PRE-AGGREGATED DATA MARTS Informational MetaData Gateway Create Data Web API Server Server MDDB Server R/OLAP Server Web Browser OLAP Tool R/OLAP Tool Data Mining Query Consume NGB Strategic Push Button NGB NGB User NGB IS Admin Users NGB IS Admin Users Power User Knowledge Workers What Effect Does the Web Have on the DW?
Data Warehousing Tips from SI International • The data warehouse serves as a staging area for the data marts. • Load the incoming data into a “pre-staging” area. • All fact tables should be appropriately “time-variant relational data structures.” • Re-generate the OLAP cubes from the data warehouse. • Use regulations and policy directives as an aid to gleaning the inherent structure of the data and the business requirements. • Use the star schema approach when the true normalized structure is not known and insufficient time exists to discover it.
Data Warehousing Tips from SI International • Adhere to normalization for the sake of integrity and ease of updates to the data warehouse when possible. • Add a physical creation date & time column to all tables. • Default it to the system date with a column constraint of NOT NULL. • Look for features in the BI tool suite that address the problem of “slowly changing dimensions.” • Incorporate them into modeling efforts.
Tips and Challenges on Modeling Tom McCullough
Tips on Modeling from Experience • Data modeling is about classification and the challenges involved. • Level of generalization in the data model drives the functional design, but not all notational formats support generalization on the process side. • Critical decisions are involved in determining what level of abstraction to choose. • “Use of subtypes is single most constructive creativity technique in data modeling because they make different levels of generalization possible” (source: Graeme Simsion, university of Melbourne, April 2003). • Decide whether the modeling process or the modeling product is the objective. • How do you obtain consensus for enterprise level definitions? • Dealing with alternative designs.
More on Tips and Challenges for Modeling • Challenge of getting time to validate the model w/users. • Challenge of making the model understandable for programmers and discussing the implications. • Look for unwarranted duplicate relationship paths. • Managers: beware of letting the developers do the data modeling.
More on Modeling Challenges • Avoiding being perceived as the bottleneck. • Model management and collision analysis. • Synchronizing the conceptual model with changes made to the logical model and physical model after the forward engineering has been done.
More Modeling Issues and Challenges • Need clear distinction of the responsibility between the data modeler and the DBA. • Tradeoff between suites and best of breed tools. • Integrating the meta data between multiple tools in a collaborative development. • Adhere to a notational format and the rules inherent to it.
Which Modeling Notational Format to Use? • IDEF1x – relatively simple; Still in use although on the way out • UML • In vogue; • In use by major software vendors; • Single toughest challenge with UML: for a given problem, which UML diagram(s) applies and which diagram symbols should be used to model the solution? • Oracle Designer: unique to the tool; Can easily be confusing • Information Engineering – relatively simple; No longer “in vogue”; Perhaps easiest for users to understand
Stick to Fundamentals • “He who ignores the fundamentals is destined to design an overcomplicated solution!” • ANSI 3 Schema Model for managing information: internal; Conceptual; And external schema. • Adopt and adhere to consistent naming conventions: • Classification of data elements: • prime words; Class words; And qualifying words. • Object class term; property term; qualifier term; representation term.
Remember the Fundamentals • The questions asked in the user/client fact finding interview are paramount. • “If you listen carefully, the patient will reveal the diagnosis.” • Don’t re-invent the wheel. Look at other sources that have tried to solve the same problem. • Institute for Defense Analysis models. • Creative data modeling requires creative process and activity modeling.
Lessons Learned: What Are SI International’s Top 10 Ingredients for Success in DW and BI as of April 2003? Tom McCullough
Top 10 Ingredients for Success in the BI Space • #10: The data warehouse is only one component of the organization’s enterprise architecture a DW is not the solution to every IT problem. • #9: IT is not the solution to every problem. Remember the lessons from industrial engineering, “streamline first, then automate!” • #8: It takes perseverance to succeed in data warehousing. • “To persevere, one must first experience frustration…” • Source unknown.
Top 10 Ingredients for Success in the BI Space • #7: Know and apply the fundamentals of data management. • #6: Identify The Acceptance Criteria. • “It All Comes Down to a 17 Inch Screen and a Mouse.” • Source: Dr. Claudia Imhoff, 2003 Dama International Symposium, Orlando, Florida. • #5: Software Engineering is a team sport and data warehousing is a specialized version of software engineering. • One person cannot do it all. Look for team players.
Top 10 Ingredients for Success in the BI Space • #4: In addition to ability, you must have your mind and heart into the job. When deciding upon recruiting for your team, remember the words of Bela Carosi (coach of Mary Lou Retton), “You musta’ looka’ for the fire in the eyes.” • #3: Learn from others; our patterns are what we bring to the table. • #2: Study the tenets & methods of other professions: • Building architecture & construction many analogies here. • The doctor / patient relationship. • Journalists: They’re experts at asking follow-up questions on the fly.
Top 10 Ingredients for Success in the BI Space • #1: “What you have just heard was actually performed by professionals. Do not attempt this at home.” • Lesson: if you want world class results, hire/contract world class professionals. You always tend to get what you pay for. • Remember: when buying medical, dental, information asset management products and services, parachutes, or life preservers beware of vendors offering price discounts or other incentives and gimmicks to get you to buy their products and services.
Suggestions for Further Readingon Critical Thinking for Data Architects • Temporal Data and the Relational Model, by Date, Darwen, and Lorentzos • Introduction to Database Systems, 7th edition, ISBN: 0201385902 • The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse, ISBN 0471-37680-9 • The Data Warehouse Toolkit, 2nd Edition, ISBN 0471200247 • The Common Warehouse Metamodel, ISBN: 0-471-20052-2 • The Federal Enterprise Architecture Framework • “The DoD framework” (previously “the C4ISR framework”) • Oracle 8i and Microsoft SQL server 2000 integration, ISBN: 0-7645-4699-6
Suggestions for Further Readingon Critical Thinking for Data Architects • Data Warehousing: Using the Wal-Mart Model, ISBN 155860684X • Information Management Directions: The Integration Challenge, NIST special publication 500-167 • Guidelines to Implementing Data Resource Management, Version 4, ISBN: 0-967-6674-1-0 • Guide on Data Entity Naming Conventions, NBS special publication 500-149 • Information Engineering for the Practitioner, ISBN: 0-13-464579 • Larry Greenfield’s website: http://www.dwinfocenter.org • Dr. Joe Firestone’s website: http://www.kmci.org/ • Fabian Pascal’s website: http://www.firstsql.com/dbdebunk/