430 likes | 590 Views
IBM Information Server. Understanding Your Source Systems. Corporate View of Information Architecture Is Changing. Information is the key to Business Innovation Organizations highly effective at driving information integration are 5 times more likely to drive value creation
E N D
IBM Information Server Understanding Your Source Systems
Corporate View of Information Architecture Is Changing • Information is the key to Business Innovation • Organizations highly effective at driving information integration are 5 times more likely to drive value creation • Information architecture can’t exist in a vacuum – it needs to be tied to enterprise architecture 87% of CEOs believe fundamental change is required in next two yearsto drive innovation Over 60%of CEOs believe their organizations need to do a better job leveraging information Source: 2006 IBM Global CEO Survey
Customer Business Issues • Too much information and not knowing what’s important • Not using demand signals to drive supply chain • Not using customer analysis to tailor marketing and sales • Not leveraging valuable unstructured information • Multiple versions of the truth • Problems managing customer, product and partner interactions • Regulatory compliance inhibited by poor transparency • Lack of trusted information • Incomplete, out-of-date, inaccurate, misinterpreted data • Difficult to understand or control how information is used • Lack of agility • Inability to take advantage of opportunities for innovation • Escalating costs due to inflexible systems and changing needs
Today’s Information Challenge Organic growth, mergers & acquisitions, and compositeapplications have led to a vast and rapidly growing sea of information sources in our enterprises • Where is my information? • What does it mean? • Where is it used? How Can I Identify and Mitigate Data Risks When I Don’t Even Know Which Data I Have?
Software Architect Developer BusinessAnalyst End Users Data Architect IT Admin Extended Search Sources 30% of people’s time is searching for relevant information 30% of development time is copy management 40% of IT budgets may be spent on integration Data Admin Data Sources Today’s Information Challenge Many User roles messages XML data warehouses DBA of information spreadsheets Access to all forms apps relational databases @ … Content Sources content repositories office e-mail reports fax
What are systems of records for Customer, Product, and Item masters? How can I access a single view of my customer records and use it to enrich my customer interactions? How do I ensure that quality measures and metrics are consistent for the master data across data sources? How do we measure product and service effectiveness, when my product definitions are not consistent? How do I ensure that replicated sources are consistent with the system of record? How do I understand whether the data is valid, accurate, and relevant? I have multiple implementations of the same application plus custom applications with their own data models; how do I exchange data between different applications? Why do we re-invent the wheel for product and customer data when we develop a new application? Key Enterprise Questions in Understanding Data In information and process driven business environments the ability to understand, standardize, and validate core enterprise information is of strategic importance. In most organizations, core information remains in data prisons, preventing a single, integrated enterprise-wide view of data across applications.
Developer Attraction Trip Plan 0..* + Name : String + Name : String + Addr : Address 0..* + Start : Date + Stop : Date Amusement Park Restaraunt + Daily Admission Adult : Currency + Ethnicity : String + Daily Admission Child : Currency Data Architect The “Understanding” of the System Conceptual Logical Physical
The “Reality” of the Underlying Data Metadata Description Actual Record Values Name-Line1 Robert A. Jones TTE Robert Jones Jr. Name-Line2 First Natl Provident Name-Line3 FBO Elaine & Michael Lincoln UTA Address-Line1 DTD 3-30-89 59 Via Hermosa Address-Line2 c/o Colleen Mailer Esq Address-Line3 Seattle, WA 98101-2345 Investor Account Position Custodian Trustee Address Financial Instrument
Common Data Problems Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116 Catherine Roberts Four sixteen Columbus APT2, Boston, MA 02116 Mrs. K. Roberts 416 Columbus Suite #2, Suffolk County 02116 • Lack of information standards • Different formats & structures across different systems • Data surprises in individual fields • Data misplaced in the database • Information buried in free-form fields • Data myopia • Lack of consistent identifiers inhibit a single view • The redundancy nightmare • Duplicate records with a lack of standards Name Tax ID Telephone J Smith DBA Lime Cons. 228-02-1975 6173380300 Williams & Co. C/O Bill 025-37-1888 415-392-2000 1st Natl Provident 34-2671434 3380321 HP 15 State St. 508-466-1200 Orlando WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES USE 4 5J868A BOLTS (HEX .25) - DRILL HOLES FOR EA ON WING ASSEM RUDER, TAP 6 WHOLES, SECURE W/KL2301 RIVETS (10 CM) 19-84-103 RS232 Cable 6' M-F CandS CS-89641 6 ft. Cable Male-F, RS232 #87951 C&SUCH6 Male/Female 25 PIN 6 Foot Cable 90328574 IBM 187 N.Pk. Str. Salem NH 01456 90328575 I.B.M. Inc. 187 N.Pk. St. Salem NH 01456 90238495 Int. Bus. Machines 187 No. Park St Salem NH 04156 90233479 International Bus. M. 187 Park Ave Salem NH 04156 90233489 Inter-Nation Consults 15 Main Street Andover MA 02341 90345672 I.B. Manufacturing Park Blvd. Bostno MA 04106
“Data Silos” compound the problem Multiple touchpoints across the enterprise Online Registration Sign up -Seminar, Newsletter,Promotion Online Purchases Purchases Service and Support Payments Customer and user data resides in multiple databases. Corp DW Interaction between systems requires a custom interface
Software Architect Developer BusinessAnalyst End Users Data Architect IT Admin Data Admin Domain Expertise constantly changes Domain Expertise constantly changes • Domain expertise is critical for understanding the data, the problem and interpreting the results • “The counter resets to 0 if the number of calls exceeds N”. • “The missing values are represented by 0, but the default billed amount is 0 too.” • Insufficient domain expertise and understanding is a primary cause of poor data quality – data becomes unusable • Usually in people’s heads – seldom documented • Fragmented across organizations • Lost during personnel and project transitions • If undocumented, knowledge and understanding deteriorates and becomes fuzzy over time
Legacy Finance Account Part Bill To Ship To Account (Product, Location) CRM ERP Account Product Contact Household Vendor Contact Material Location Today’s Information ChallengeInconsistent Understanding Across Enterprise Sources Different… • Data values that uniquely describe a business entity used to tell one from another (customer name,address, date of birth…) • Identifiers assigned to each uniqueinstance of a business entity • Relationships between businessentities (two customers “householded”together at the same location) • Hierarchies among business entities(parent company owns othercompanies, different chart of accounts across operations)
DB2 Mainframe SQL Server Oracle Use Case: Customer Credit Card Information & Risks • Customer example: • 4,500 databases. Multiple database vendors. • Tens of thousands of tables, many more views… • Where is customer credit card information stored and used? • Which tables? Views? Stored Procedures? Servers? • Applications? Messages? Partner interfaces? • What does it mean? • Different names/structure for same information, no conventions • How can I modify my data architectureto provide a more sustainable solution?
Business Drivers for Investment Depend on Understanding • Empowering risk & compliance initiatives with the information they require • Optimizing Revenue Opportunitiesby ensuring effective and efficient interactions with customers, partners, and suppliers • Enabling collaborative business processeswith consistent and trustworthy information • Reducing the total cost of ownership for maintaining consistent information across the enterprise
IBM Information ServerA Platform for Understanding IBM Information Server Unified Deployment Transform Deliver Understand Cleanse Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Unified Metadata Management Parallel Processing Rich Connectivity to Applications, Data, and Content
Business Users Subject Matter Experts Architects DataAnalysts Information Analyzer Business Glossary Data Architect Serrano-Hawk Version Information Integration: Understand Understand Structure- and data-driven data modeling and management Analyze data, and report and monitor based on integration and quality rules Perform structure-driven reporting and annotation Unified Metadata Management
Subject Matter Experts DataAnalysts Physical Metadata: IBM Information Analyzer • Data-centric analysis of application, database and file-based sources • Secure, detailed profiling of fields, across fields, and across sources • Creation of metadata from profiling results • Results instantly promotable across IBM Information Server IBM Information Analyzer Understand Analyze source data structures, and monitor adherence to integration and quality rules Physical View
Business Users Subject Matter Experts Business Metadata: IBM Business Glossary • Web-based authoring, managing & sharing of business metadata • Aligns the efforts of IT with the goals of the business • Provides business context to information technology assets • Establishes responsibility and accountability IBM Business Glossary Understand Create and manage business vocabulary and relationships, while linking to physical sources Database = DB2 Schema = NAACCT Table = DLYTRANS Column = ACCT_NO data type = char(11) GL Account Number The ten digit account number. Sometimes referred to as the account ID. This value is of the form L-FIIIIVVVV. Business Technical Business View
Subject Matter Experts Architects Logical Metadata: Rational Data Architect • Data modeling for data structures and federations • Federated data discovery • Metadata relationship discovery & mapping • Impact analysis, and synchronization across models • SQL & XML generation capabilities Rational Data Architect Create and manage business vocabulary and relationships, while linking to physical sources Data Modeling & Mapping
s l l e Profile data to analyze and understand data sources Database = DB2 Schema = CUSTMSTR Table = CUSTOMER Column = TAX_ID data type = char(10) In 3% of the data, this number is only 9 characters long. There are 3 distinct data formats, but there should only be 1. 256 Duplicate values exist in this field. Create a business glossary: common vocabulary between business & technical users Database = DB2 Schema = CUSTMSTR Table = CUSTOMER Column = CC_NO data type = char(16) Credit Card Account Number The sixteen digit account number. Sometimes referred to as the CC ID. Private. Access is restricted, need-to-know basis. Extend and map metadata to identify critical data elements, business rules, certification requirements • “What data contains “Credit Card Account”?” • “Who is responsible for the stewardship of this information”? • “Which databases contain credit card information?” Monitor data quality to continually assess and certify critical business information The number of TAX_ID exceptions has increased over 3 consecutive weeks and has now exceeded the baseline. The data validation report indicates a new data source is producing an unacceptable level of bad values. • “Where do we need to implement more stringent controls?”
DB#2SQL Server Metadata Repository DB#3 DB2 Step 1A:Automated discovery of databases, tables, servers, views Discover and remember which data sources contain credit information • Discover and reverse engineer data sources DB#1Oracle Physical Data Models • Automatically generate data models from schemas • Visualize and annotate data models • Export models to metadata repository
Overview diagram Topology diagram Search Step 1A details: Visualize & Search Data Structure to Aid Understanding
s l l DB#2SQL Server e Metadata Repository DB#3 Oracle Step 1B:Guided discovery to find and relate elements Build a standardized model for credit card information and semantically map existing schemas • Create Standard Schema with Glossary Standard Glossary DB#1DB2 StandardData Model • Map Database Schemas to Standard Schema using Glossary • Visualize and annotate data models • Export models to metadata repository • Optionally deploy to federated server to facilitate review
Metadata Repository Step 2A:Automatically discover underlying data content & quality Use WebSphere Information Analyzer to profile data sources Many Systems and Sources Rapidly expand and deliver knowledge base on critical data WebSphere Information Analyzer
High Concurrency Scalable Architecture Common Services Backbone Common Metadata Repository Common Connectivity Parallel Engine Step 2A details: Scalable Processing with Shared Metadata
Metadata Repository Step 2B:Analyze data to identify anomalies and expand knowledge Use WebSphere Information Analyzer to review data sources In 3% of the data, this number is only 9 characters long. There are 3 distinct data formats, but there should only be 1. 256 Duplicate values exist in this field. Database = DB2 Schema = CUSTMSTR Table = CUSTOMER Column = TAX_ID data type = char(10) Uncover unexpected or undocumented data anomalies WebSphere Information Analyzer
Step 2B details: Domain Integrity • Lexical Analysis • Pattern Consistency Entity Integrity DataAnalysts • Duplicate Analysis Targeted • Targeted Data Accuracy Columns Subject Matter Experts all Targeted Full Volume Data Entities information Profiling Source(s) Analysis Review Decisions Targeted Columns Metadata Integrity Relational Integrity Domain Integrity • Foreign Key Analysis • Completeness • Cross-Domain Analysis • Redundancy Analysis Requirements Specifications Reference Tables Domain Integrity • Consistency • Business Rule Identification & Validation Structural Integrity • Table Analysis • Primary Key Analysis
Metadata Integrity Value Frequencies Data Classification Data Properties Common Formats Step 2B details: Column Level Understanding
Domain Integrity Completeness Validity Format Conformity Mapping Values Reference Tables Step 2B details: Domain Level Understanding
Structural Integrity Single or Multi-Column Primary Keys Uniqueness Duplication Step 2B details: Table Level Understanding
Relational Integrity Foreign Keys Single Column or Multi-Column Assessment Referential Integrity Cross-Domain Commonality or Redundancy Step 2B details: Cross-Table/Cross-Source Level Understanding
Metadata Repository Step 2C:Define glossary/taxonomy in business terms Credit Card Account Number The sixteen digit account number. Sometimes referred to as the CC ID. Private. Access is restricted, need-to-know basis. Database = DB2 Schema = NAACCT Table = DLYTRANS Column = CC_NO Data type = char(16) Duplicates = .012% Create a common vocabulary between business & technical users Rational Data Architect WebSphere Business Glossary WebSphere Information Analyzer
s l l e Metadata Repository Step 2D: Search, explore metadata using business terms • “What is our business definition of “Credit Card Account”?” • “What data contains “Credit Card Account”?” • “Who is responsible for the stewardship of this information”? • “What are the security restrictions on credit card accounts?” WebSphere Business Glossary • “Which reports include credit card information?” • “Which databases contain credit card information?”
Step 3:Monitor data over time for ongoing understanding Use WebSphere Information Analyzer to establish baselines of information to audit over time • “Where do we need to implement more stringent controls?” • “How do we ensure that critical data meets our standards?” Identify and mitigate risk. • View differences between current state and the baseline
Lessons Learned & Best Practice:Control Scope Ruthlessly / Focus on Benefits • Business must own scope • Business should be owners, not renters • IT maintains its independence by not taking sides • Controlling scope encourages project discipline • Continually Scope & Iterate • Don’t boil the ocean • Projects which try to do it all in one pass generally fail • Measure, Report, and Deliver benefits regularly • Initial projects must provide some benefit within 6 - 9 months at the minimum (even if a small benefit) • Subsequent phases should provide benefits every 3-6 months
Summary • Information understanding is becoming an increasingly important organizational issue • Most critical business initiatives depend of quality information • Improving understanding requires a focused programmatic approach including business, data, and system levels • The IBM Information Server provides all of this in a unified platform • At the core of any source system analysis is a platform capable of providing ongoing understanding IBM Information Server
How Can IBM Help? • Comprehensive platform for data understanding • Experience and repeatable process for helping organizations set up source analysis and data quality programs • Domain and industry-specific expertise in establishing repeatable source analysis, semantic, and data quality services • Data quality assessment offering to report on existing data content and quality and establish the business value of ongoing source analysis through a data quality program • Contact your IBM representative for more information
Information On Demand 2006Register Now: www.ibm.com/events/informationondemand Why attend: • Participate in the PREMIER discussion on the future of Information Management • Learn how the transformation to Information as a Service will help you unlock business value and drive competitive advantage • Hear how your peers are realizing ROI • Understand the roadmap to long term strategic advantage • Learn best practices in your industry • Receive the best in technical education and free certification • Extensive opportunities for networking with both your peers and industry experts IBM Information On Demand 2006October 15-20, 2006 Anaheim, California • The premier information management event for business and IT executives, managers, professionals, DBA's and developers. • Select from over 800 sessions: a 2 1/2 day business leadership track with 180 sessions and a 5 day technical track with 650 sessions. • Latest strategy and product announcements • Large Expo Center, Hands on labs • One on ones with executives and specialists • Birds of a Feather roundtables