240 likes | 630 Views
How well do you know your DATA?. John Ramoutsakis May 10, 2012. Is Data Liability?. $$$ for Data Storage $$$ for Data Backups $$$ for Data Archiving $$$ for Data Replication $$$ for Data Synchronization $$$ for Disaster Recovery Planning. Is Data Asset?. Helps in making decisions
E N D
How well do you know your DATA? John Ramoutsakis May 10, 2012
Is Data Liability? • $$$ for Data Storage • $$$ for Data Backups • $$$ for Data Archiving • $$$ for Data Replication • $$$ for Data Synchronization • $$$ for Disaster Recovery Planning
Is Data Asset? • Helps in making decisions • Provides 360 degree view across the enterprise • Helps to understand the customer • Helps in building effective Marketing Campaigns • Predictive Analysis • Statistical Analysis • Sentimental Analysis
Data Governance Program • People • Organizations need executive sponsorship • Process • Documented repeatable processes and procedures • Technology • Data Integration, Data Quality, Data Synchronization, and Data Management
iWay Data Integration Enablement • ERP/Financials • Ariba • I2 • JD Edwards • Lawson • Manugistics • Microsoft • Oracle • SAP • Industry • HIPAA • CIDX • HL7 • RNIF • SWIFT • 1Sync • Legacy Systems • CICS • IMS • VSAM • .NET • Java • TUXEDO • etc • SFA/CRM • Amdocs/Clarify • BMC/Remedy • MSDynamics • Oracle/Siebel • Salesforce.com • SAP • Data Warehouse • DB2 • ETL • Oracle/Essbase • MS SSAS/OLAP • Netezza • SAP BW • Teradata • B2B • Internet EDI • Legacy EDI • MFT • Online B2B • XML 300+ Adapters
Data Profiling Statistical Analysis An overview of summary values, such as extremes, distribution and frequency analysis. Domain Analysis A configurable analysis of data types. Mask and Group Analysis An overview of value formats, groups and dimensions. Business Rules An analysis of the results of user-defined business rules. Foreign Key and Dependency Analyses An inside look into complex connections in the data. Drill Through The option to display individual records that correspond to aggregated results. Data Mart Reporting and analysis across multiple data set analyses Web and/or hardcopy report viewing and distribution
Profiling Data Quality Management Cycle Deviance identification Metadata understanding Ongoing monitoring Issuescauses identification Monitoring and reporting Data understanding KPI definition Parsing Association (householding) Format correction Content evaluation Deduplication / identification Automatic correction Unification Data enhancement Data cleansing Enrichment Context-based cleansing Standardization
iWay Data Quality Center Parsing: Decomposition of fields into component parts. Cleansing: Modification of data values to meet domain restrictions, integrity constraints or other business rules that define sufficient data quality for the organization. Standardization: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns. Validation: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns. Enrichment: Enhancing the value of internally held data by appending related attributes from external sources. Matching: Identification, linking or merging related entries within or across sets of data.
Mastering Master Data • What is Master Data? • Data describing your main business entities • Data duplicated in multiple systems • Data reused by multiple business processes • Examples • Customer/Citizen/Patient • Company/Partner/Agency • Products/Items/Equipment • Vendors/Suppliers • Cost Centers/Employees • Etc, etc, …
Unification identification of the set of records connected to one person address vehicle contact …etc. Deduplication golden record creation (the best representation of the identified subject) Identification new data entries – to identify subject (person, address, etc.) to which the new record is connected (matched) Complex business rules using sophisticated algorithms and functions including Levenstein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc. Master Data – Match & Merge
Data Quality Portal - Complex Exception Handling Portal KPI / DQI calculation DQ plan Reports Invalid data extraction Resolution queue Resolution Queue Workflow Exception DB Exception management
Human Mind vs. Computer Systems Hahaharaedtihs! icdnuoltblveieetaht I cluodaulacltyuesdnatnrdwaht I was rdanieg. The phaonemnelpweor of the hmuanmnid, aoccdrnig to a rscheearch at CmabrigdeUinervtisy, it dseno'tmtaetr in wahtoerdr the ltteres in a wrod are, the olnyiproamtnttihng is taht the frsit and lsatltteer be in the rghitpclae. The rset can be a taotlmses and you can sitllraed it whotuit a pboerlm. Tihs is bcuseae the huamnmniddeos not raederveylteter by istlef, but the wrod as a wlohe. Azanmig huh?
Merge John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue M4X 1V5;ON;Toronto;25 Linden Street The newest permanent address The most frequent address
Merged records – after update One updated source recordmay cause modification in several records in MDC
Real World Use Case The Goal • Major hospital group is building a Master Patient Index • Need to bring in acquisitioned systems • Cleanse, Standard, Deduplicate The Challenge • Previously manually processed by hiring temporary staff • Current phase projected to take temporary staff of 20 over 18 months The Strategy • Automate the cleansing, matching and merging business rules • Data Stewardship provides human oversight to automated process The Benefits • Identifies the duplicate records according to very complex business rules • Reusable rules for future phases • Significantly reduced project time – from 18 down to 4 months. • Over 400% ROI projected
Real World Use Case Goal • Performance Management • Business Intelligence • Change Management Process The Challenge • 100 Locations • 14 Systems with out-of-sync master data The Strategy • Cleanse, Standardize, Match • Master Data Management – Directorate, Borough, Site, Service Type, Service Point, Team, Staff, Patient • Master Data Governance Workflow The Benefits • Dynamic organizational change to support strategic initiatives • Complete visibility into performance of organization vs goals
Real World Use Case The Goal • Services organization supporting the airline industry sells decision support information to the industry members. The Challenge • Data Quality was adversely affecting the customer base satisfaction • Data Quality was impacting new revenue generation opportunities The Strategy • Profile analysis according to specific business validation rules • Monitor rolling 13 month window comparison of monthly data profiles • Accumulate and report analysis to data providers The Benefits • Improves customer satisfaction and confidence in the information • Increases reliability of the information as new data sources are added • Documents and audits quality-control processes for customer review • Reduces the dependency on human resources to detect and correct data quality issues
Summary of considerations • Access to variety of data sources • Ability to influence data improvement anywhere in the process • Useable in batch and/or (real) real-time processing mode • Extensible by customized business rules • Access to third party data and services • Historical and distributable analysis • Reusability across multiple phases and projects • Integrated data stewardship • Platform flexibility for deployment and licensing • Vendor partnership and support Information Access Data Quality Master Data Management Data Governance
Integrate All Information Any Data Any System Any Protocol Any Platform iWay Software Benefits • Real-time, Online, and Batch • Data Integration • Application Integration • Business Integration • Service Oriented Architecture Any Process Latency Scheduled Process Driven Event Driven User Driven Single Solution Platform Single Engine Fast and Scalable Secure and Reliable Fully Extensible