220 likes | 316 Views
Objectives for 2/21. Evaluate and understand data quality and data integration issues. Managerial perspective. Technical perspective. Define IT and data governance. Understand basic governing and managing structures for IT and data. Last Week.
E N D
Objectives for 2/21 • Evaluate and understand data quality and data integration issues. • Managerial perspective. • Technical perspective. • Define IT and data governance. • Understand basic governing and managing structures for IT and data.
Last Week • Explored issues with data quality and integration directly related to two case applications. • Discussed problems with the implementation of BI. • Highlighted implementation problems that are due to lack of good governance methods for BI.
Data categories • Structured Data • Can define fields with data types and sizes • Transaction data: current and historical • Referential data/master data (MDM): current and historical (referred to as “slowly changing dimensions”) • Example: Customer (master data) places order (transaction data) for item (master data) • Unstructured Data
Data Integration • What is data integration? • Data that is available for use by multiple applications. • Data that is able to be combined relatively easily to support decision making. • Data that is stored uniquely and non-redundantly. • Why isn’t data integrated at most organizations?
Benefits and Drawbacks of Integration • The benefits of data integration are all “potential” • Reduced costs • Increased revenue • Reduced product development time • Better customer service • Better control • Better compliance with governmental regulations • How about the drawbacks?
What methods are currently being used to achieve data integration?
Data Consolidation • Referred to as the extract, transformation and load (ETL) process. • At the simplest level, this is copying data from one database to another database. • Is usually customized for each database/data warehouse • Can be performed with customized programs; most organizations use tools and provide customization as necessary.
Basic ETL Tasks • Extract • Take data from source systems. • May require middleware to gather all necessary data. • Transformation • Put data into consistent format and content. • Validate data – check for accuracy, consistency using pre-defined and agreed-upon business rules. • Aggregate data as necessary. • Update keys as necessary. • Convert data as necessary. • Load • Use a batch (bulk) update operation that keeps track of what is loaded, where, when and how. • Keep a detailed load log to audit updates to the data warehouse.
Data Quality • What is good quality data? • Correct • Accurate • Consistent • Complete • Available • Accessible • Timely • Examples of good and bad quality data
Data Quality Managerial Issues • How do we measure “good” quality data? • What is the impact (benefits and costs) of good quality data?
What Causes Bad Quality Data? • Start out at the macro level: Poor understanding of information/knowledge needs. • Go to the technical level: Multiple data sources and heterogeneous systems. • Drill down to the micro level of causation: Problems with too lenient or too strict input rules.
Dealing with Bad Quality Data • How is bad data fixed? • Methods of identification. • Methods of repair • How is bad data prevented? • Macro • Technical • Micro • Who is responsible for bad data?
Data Quality Improvement Management • Assess • Prioritize processes and data • Determine value of good quality data to the organization • Determine level/range of necessary data quality • Identify appropriate methods of data quality improvement • Identify appropriate tools to support data quality improvement • Determine method of governance • Establish procedures • Establish method of ongoing evaluation and adaptation • Educate/train
Ongoing Issues • Assume data is integrated and of good quality. • Is that situation static? • What factors will affect the ongoing integration and quality of data? • What do you think about the implied suggestions in the cases that business and technology management should simply communicate and all will be well?
IT Governance • The decision rights and accountability framework created to encourage desirable behavior in the use of IT. • Define expectations. • Grant power. • Verify performance. • Consists of processes, customs, policies. • Management hierarchy one of the key aspects.
Organizational Design Choices • Division of labor (level of specialization) • High (can focus on a single task) vs. Low (most people do lots of different tasks) • Level of authority • High (a few people make decisions) vs. Low (many people make decisions) • Departmentalization • Homogeneous (clear differences) vs. Heterogeneous (significant overlap) • Span of control (number of people managed by a given manager) • Few vs. many
Governance More than Org. Structure • What kinds of governance structures are used to ensure alignment of the IT organization with the core business organization? • Let’s say that the core business organization wants to offer products for the lower possible prices. How will IT work with the rest of the organization to achieve that objective? • Why isn’t the CIO just responsible for his/her area?
Data Governance • Structures of people, processes and technology to enable an organization to leverage data as an enterprise asset. • High level organizational groups and processes. • Data quality initiatives. • Data integration initiatives. • Business intelligence initiatives. • Goals: transparency, availability, increase asset value • What are these structures and where do they “fit”?
Sample Roles/Titles in Data Governance • Data Administrator • Metadata Administrator • Data Steward • Security Officer • Data Assurance Officer • Data Architect
Organization Structures for Data Governance • Steering Committees • Project Committees • Project Management Office
Data Governance in Practice • Why don’t most large organizations have formal enterprise data governance policies? • Why don’t most large organizations have data stewardship responsibilities defined and delegated?