430 likes | 656 Views
Practical Meta Data Solutions For the Large Data Warehouse DAMA - MN October 16, 2002. Agenda. Introduction Enterprise meta data strategy Data warehousing meta data strategy Project approach for a practical solution Meta data architecture Defining ROI Tools/options for moving forward
E N D
Practical Meta Data Solutions For the Large Data Warehouse DAMA - MN October 16, 2002
Agenda Introduction Enterprise meta data strategy Data warehousing meta data strategy Project approach for a practical solution Meta data architecture Defining ROI Tools/options for moving forward Meta data summary The data quality cycle Questions
Everyone knows Meta data is: • Valuable • The right thing to do • Important for long-term success Meta Data So why isn’t everyone doing it? Introduction
Why isn’t meta data being addressed? Don’t know how to demonstrate the ROI Too complex or we don’t know where to begin Can’t agree on what should be done Market is not mature enough – we’ll wait until it settles down We’ve tried and failed Where do we start? How do we justify it? Introduction
What is the cost of dirty data? “The cost of poor data may be 10-25 percent of total revenues” - Larry English Data quality issues torpedoed a $38 million CRM project - Fleet Bank 1996 The Data Warehouse Institute (TDWI) estimates that poor quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead Real Life Insurance Example - $10 million annually • 2 million annual claims with 377 data items each • Error rate of .001 generates more than 754,000 errors per month and over 9.04 million annually • If 10% are critical to fix, there are still over 1 million errors to correct • Even at a conservative estimate of $10 per error, the companies risk exposure to poor claim information is $10 million a year Source: TDWI Data Quality and the Bottom Line Introduction
Limited exposure to the data Greatly increased exposure to the data The data warehouse and Operational Data Stores have revealed the impact of data quality problems on the business Meta data is the component that ties the data warehouse and ODS to the legacy systems and exposes the data quality problems Understanding Data Quality Issues Legacy applications Introduction
Setting boundaries You can’t do it all at once! Enterprise meta data strategy
Data Warehouse and Business Intelligence Enterprise Architecture • EAI • ERM CDISC HL7 Clinical Trials Clinical Patient Care Component Management Document Management Content Management and Portals Business Rules Establishing a meta data strategy Select a practical starting point and build on your success! Enterprise meta data strategy
Starting with the data warehouse Data Quality Systems A practical strategy with real business benefits! Data Warehousing meta data strategy
Why is the DW a good starting point? DW typically focuses on the data that most needs to be shared DW presents the greatest need to understand the data because it is cross-functional Real business benefit can be obtained for a practical investment Existing DW are being re-architected Meta data standards and tools are beginning to have an impact in this area Challenges are created in a best-of-breed development environment Built for today - architected for tomorrow Data Warehousing meta data strategy
MD integration challenges in the DW architecture Data Warehousing meta data strategy
Build and Administer Usage Key Attributes Mappings Performance Integrity Scalability Operator Modeler Designer ETL Jobs Statistics Data Model ETL Tool Navigate Alias Canned Reports Refresh Data Location Expedience Accuracy New/Casual User Executive Frequent User Data Model Business Intelligence Job Schedules / logs Defining meta data for the DW The formal approach to managing the processes and information needed by both business and technical associates to define, build, administer and navigate the DW Building Blocks Example Benefit User Source Define Business Meaning Calculations Lineage Recognition Understanding Trust Casual User Power User New User Heads and documents Spreadsheets ETL mappings Robust, integrated meta data solutions will aid in using, developing and operating the data warehouse source: META Group Meta data solutions are also referred to as: Information Catalogs or digital DNA Data Warehousing meta data strategy
The need for a complete project lifecycle • Document today’s meta data environment • Identify meta data users and its sources • Identify the business drivers • Problems, opportunities and associated costs • Identify requirements • How does meta data address the business drivers • What are the savings • Define objectives and benefits • Develop a meta data architecture • Processes and disciplines • Integration requirements • Delivery and usage requirements • Technology component / tool • Change management process priorities • Develop project plan and cost/benefits • Build with an Iterative release approach Building an Architecture Process Layer Processes and disciplines required to generate and sustain complete and accurate meta data Integration Layer Exchange of meta data between multiple tools across the data warehousing framework Technology Layer Automate processes and integration and provide a single Web based view consolidated across tools Project Approach for a practical solution
1 2 3 4 What are successful projects addressing? • Lineage • What data is available, where it came from and how it’s transformed • Including definition, currency and accuracy • Appropriate information by user type • Easy access to meaningful meta data • “How is it different from what I’m used to seeing?” • Impact analysis across tools and platforms • Impossible to do without a formalized meta data technology solution • Versioning • How has it changed over time? • Moving from development, to test, to production • Live meta data • Meta data is a natural part of the process • A function fails if the meta data is not complete and accurate Project Approach for a practical solution
6 Object Model File Systems Modeling ETL RDBMS Mappings 5 7 Bus Req Meta data architecture for the DW Collection Points: Processes and disciplines Live meta data concepts System of Record Auditing Balancing and controls Data to support lineage Data Cleansing Householding Data to support lineage Change management Source Control Configuration management Source systems Data warehouse Staging area / ODS Data Validation Data Profiling DW & ETL Design Job Execution Migration Mgmt Data Quality BI 1.Collection 2.Integration Business and technical user Interface Meta Data Repository Engine WEB 3.Usage Consolidated view across tools Centralized Meta Data Repository • Manage redundancy • Provide one view across tools Physical Repository RDBMS Oracle, DB2, others Meta Data Architecture
Components of a well-architected DW solution • Short- and long-term requirements are well-defined displaying clear business benefits • Meta tags required to support lineage, balancing and controls, etc., are built into the DW architecture • Live meta data concepts are rigorously followed • A plug-and-play architecture is used • Support for multiple tools in a category, i.e., Informatica and Ab Initio for ETL • Simplifies future transitions to new technology and tools Built for today - architected for tomorrow Meta Data Architecture
8 How is a meta data investment in the DW justified • Reduced total cost of ownership • Impact analysis -- 50 percent of development efforts are spent assessing what is impacted by the change • Configuration and migration management • Eliminate redundant work • Improved user acceptance • What’s available and how to access it • Business user understands the data and where it came from -- how is it different from the operation informational systems? • Risk avoidance • What’s the impact of not delivering the business benefits used to justify the DW • Industry or government regulations • Best practices Defining ROI
The CWM standard as an enabler • The Common Warehouse Metamodel • Model driven architecture • Based on object oriented modeling and development • Building blocks: UML, MOF, XMI and OCL • The model generates: • Repository data structure changes • APIs for models to interoperate • APIs to load and retrieve meta data • CORBA components • Meta data exchange format • Based on the CWM and Standard DTDs • XML data streams following the XMI standard CWM is supported by: • Oracle • IBM • SAS • Adaptive • Hyperion Tools / Options for moving forward
Meta data management repositories Selection Criteria • Manage associations across tools • Search and retrieval across tools – ability to display a single consolidated view • Lineage • Impact analysis • Plain English definitions • Subject areas • Ease of Extensibility • Customizable user interface using an industry standards web solution • Reduce integration cost in a best-of-breed development environment • Enable Plug-and-play tool strategy • Available automated bi-directional meta data exchange bridges / adapters • Template driven retrieval of meta data • Group / role based security • Interoperability between metamodels • Support of industry standards – CWM, HL7, etc. • Support of Federated repositories • Ability to expand beyond the DW • Versioning – extracting a time slice Tools / Options for moving forward
Types of Meta data management repositories • Enterprise – Supports a broad range of functionality including enterprise architecture, data warehousing, business intelligence, component management and others • CA – Advantage(formerly Platinum) • ASG - Rochade • Adaptive Foundation(formerly Unisys) • DW Suite solutions – meta data solutions that are integrated into a suite of tools primarily from a single vendor designed to build and maintain the complete data warehouse framework • Microsoft Repository • DWSoft – Navigator web browser • SAS –Warehouse Administrator • Oracle Warehouse Builder • OWB Repository • Enterprise Data Warehousing and ETL – supports data warehousing, ETL and business intelligence activities typically in a best-of-breed toolset environment • Ab Initio • Informatica • Data Advantage Group – MetaCenter • Ascential – Meta Stage • Modeling – supports development and versioning activates for ER and object modeling • ERwin Suite • Rational Rose • Oracle Designer • Popkin System Architect Tools / Options for moving forward
Options for moving forward with a DW solution • Build from scratch using an RDBMS and custom web delivery application • Implement a repository tool and extend it as needed • Enterprise repository tool • Warehouse suite solutions • ETL tool repository No complete solution exists today. Establish a foundation and gradually develop a complete solution through a series of iterative releases! Tools / Options for moving forward
Development lifecycle Strategy Development Architecture Development Design and Construction 4 - 6 weeks 3 - 4 weeks 6 - 10 months Define business objectives, requirements and benefits Understand standards Research repository tools Define technical objectives/requirements Document capabilities Relate standards, tools, requirements, architecture Develop scope and priorities Define meta data sources Define repository Define hardware platform and software requirements Define meta data integration ETL process Define meta data delivery/display mechanisms Iterative release approach Design and construction of meta data integration ETL Design and construct logical and physical meta data models Hardware platform implementation Test Rollout Tools / Options for moving forward
Pitfalls to avoid Selecting a repository tool without defining requirements first Expecting the repository tool to solve process problems Selecting an architecture that is not extensible or can’t scale Underestimating the effort Relying too much on manual entry Selecting an initial project that does not deliver adequate business benefit Selecting an initial project that is too complex to be practical Many meta data projects fail because they are either too big to be practical or too small to deliver real benefits Meta data summary
Web Based Access DWSoft Web Browser XML and ASP Microsoft Meta Data Services SQL Server 2000 One example of a hybrid solution Cubes, Hierarchy and Levels Field-to-Field Derivation Source-to-Target Mappings, Code Mappings, Business Rules, Data Quality, Contacts and Document Definitions DATA Modeling Business Intelligence OLAP Cubes Business Definitions for DW Attributes Logical & Physical DB Models Access Basic Oracle 8i Code & Data Mapping SQL Server 2000 ER/Win 3.5 Pro Clarity Custom API Interfaced 3rd Party/ API API Staging Area Meta Model API Data Warehouse Integrated API Physical RDBMS Info through OLE DB Codes Frequency Min/Max/Averages Custom API Oracle 8i ETL DML Information Fast, inexpensive, low-risk approach implemented at a major insurance company Ab Initio Meta data summary
Taxonomy Meta data Taxonomy Meta data The DW meta data solution within the enterprise meta data architecture Enterprise Architecture EAI ERM Oracle Data Warehouse Platform • Operational data store • Customer information • Historical information • Security Document/Email Platforms • Documents • Catalogs/digital content • Email KM Vendors • Autonomy • Intraspect • Documentum • BRS Rule Track Oracle Oracle Metadata Repository Oracle Data Unstructured Data Unstructured text and digital assets Data Knowledge Management Engine • Business rules • Content management • Automatic taxonomy generation • Neural net search engine/adaptive learning • Text mining • Personalization • XML, Java, HTML LDAP:support Data Oracle Operational Data Click stream capture Customer profile information Metadata Repository Catalogs, content, documents, web links… ETL Data Management Platform • Centralized meta model • Complex data transformations • Meta data repository • Real-time extractions • XML, Java, HTML, LDAP Metadata Click stream capture EIP Vendors • Viador • Hummingbird • TopTier Flat Files Oracle Analytical data/application launch Oracle MicroStrategy BI Platform • Analytical processing • Graphical visualization • Reports Personalized Portal Access EIP Engine • Browser-based access • Personalization • Common Authentication Proxy • Automatic taxonomy generation • Structured/unstructured info integration • XML, Java, HTML LDAP support Meta data summary
Increasing the odds of success A data warehouse project without a formalized meta data facility has only a one in four chance of being highly successful; still, in the heat of the DW battle, rarely is meta data seen on the front line! Is it worth the risk not to do it? Source: META Group’s industry study: Data Warehouse Scorecard: Cost of Ownership and Successes in Application of Data Warehouse Technology Meta data summary
The data quality cycle • Identify business drivers • Set scope • Define metrics Detect communicate communicate Correctness Defects Integrity Defects Presentation Defects Application Defects * Correct / Repair Measure / Make visible communicate communicate Prevent * How the user applies the data Data quality cycle
9 Data quality rules progression Content and structure Correctness (value based) Integrity (structured based) Focus on data content Focus on data structure Data Quality Rules Governing the Processes Data Cleansing and Transformation Rules Examine and understand data Change the data Inductive Rules Deductive Rules Data Profiling Column profiling Dependency profiling Redundancy profiling Data quality cycle
The data quality cycle Additional activities to deliver data quality • Consolidate source systems to reduce collection points and minimize system interfaces • Consolidate multiple non-integrated legacy application • Source independent data marts from the data warehouse • Consolidate shared data • Use an ODS to shared data across systems • Use reference tables and keys to logically integrate data that must remain distributed • Implement a hub / ODS for data integration • Provide a single source of clean data • Reduce system interfaces Data quality cycle
How meta data supports data quality • The meta data solution manages many components needed to support a sustained data quality effort • Navigation meta data • Lineage • Plain English Definitions • Calculations and transformations • Currency • Identify owners and stewards • Business rules • Help identify redundency • The repository captures and exposes data quality statistics to a wide audience • The repository web interface can provide a mechanism for soliciting feedback Data quality cycle
What is the Value? “Companies that manage their data as a strategic resource and invest in its quality are already pulling ahead in terms of reputation and profitability from those that fail to do so.” Source: Global Data Management Survey 2001 PricewaterhouseCoopers Data quality cycle
Easily drill to: • Target data • Transformations • Additional source data Meta data usage – development samples Data lineage – from source or target List the fields in a source file Example 1a
Meta data usage – development samples Data lineage – data quality review Example 1b
Atlas Source System 1,016,575 Monthly Billing Codes 57% - Atlas Billing Codes 24% - Total Data Warehouse Cyberlife IL 437,296 Monthly Billing Codes 57% - Cyberlife IL Billing Codes 10% - Total Data Warehouse Meta data usage – development samples Data lineage – understanding the data Example 2
Meta data usage – development samples Impact analysis Display all the transformations For a Column in the warehouse or a field from a source system Example 3
DDL RDBMS Tables Columns ETL ETL Design Codes and data mapping application DDL RDBMS Tables Columns Code mappings ETL Invalid or missing code mappings Suspense Meta data usage – development samples Live meta data Example 4
Meta data usage – development samples Integrated view of meta data from multiple sources Extracted from the repository Includes data from Oracle Includes data from ERwin Example 5a
Meta data usage – development samples Integrated view of meta data from multiple sources Example 5b
When is meta data not meta data? Column Column level meta data should be incorporated into the centralized meta data solution for easy display to a wide audience Column definition % of records from each source system Counts and % of unique values for a code Audit and Meta Tags Row Claim Source System Date Source System Transaction Identifier Updated Code Value Row level information should be captured, managed and displayed by the application, i.e., the Data warehouse, data mart or other collection points Example 6
Limited exposure to the data An awareness of data quality issues Impact of problems not easy to see Few sustained activities to correct the problems Greatly increased exposure to the data Impact of problems clearly felt Few sustained activities to correct data problems Awareness of meta data problems Few sustained activities to correct meta data Greatly increase the exposure to meta data Navigation meta data increases data access Lineage Plain English Definitions Calculations and transformations Identify owners and stewards Business rules The importance of a meta data repository Legacy applications Example 7
ROI for meta data Issue Benefit Business Analysts told us they spend up to 70% of their time locating the right information source and resolving multiple versions of the truth At least 20% improvement in the time required to locate and validate information Technical analyst’s and developer’s told us they spend up to 80% of their time finding the data needed to satisfy a request At least 20% improvement of application development and maintenance activities Business and technical analysts told us it requires six to nine months for a new associate to become proficient in using data Reduced learning curve for new associates by 3 months and lower the mentoring required from key resources Lack of complete automated Impact analysis within and across tools creates a serious risk when implementing changes At least 20% reduction in the time required to perform a complete impact analysis and reduce risk of errors when migrating changes to production Quote from the META Group’s series of white papers addressing application delivery strategies “meta data interchange will improve Application Development and maintenance efficiency by up to 30%, and real-time meta data interoperability will enable up to 50% improvement in Application Development and maintenance efficiency.” Example 8
What does correct mean Correctness Rules • Accuracy • Consistency • Completeness • Balancing • Continuity • Precedence • Currency • Duration • Retention • Precision • Granularity Activities Validation Does it match the rules Verification Does it make sense as applied to other reliable sources Accurate Data can be valid but still not be accurate Inspection Can be as simple as spot checking or as thorough data-driven discovery inspection using techniques like: pattern recognition, classification and probability Example 9