250 likes | 407 Views
Lecture 5. Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues. Building the data warehouse. Basic guidelines for the creation of a DW. Create corporate sponsors and plan thoroughly
E N D
Lecture 5 Themes in this session • Building and managing the data warehouse • Data extraction and transformation • Technical issues
Basic guidelines for the creation of a DW • Create corporate sponsors and plan thoroughly • Determine a scalable architectural framework for the DW • Identify and document all assumptions, conflicts and issues at the start of the project • Choose methodology and tools which are compatible with the organisation • Take the continuous nature of the DW life cycle into account • Ensure a thorough data analysis and resolve all data conflicts • Learn from your experience and above all, learn from your mistakes
The data warehouse life cycle Investigation On-going data administration Analysis of current environment Implementation Identify requirements Development Identify architecture Data warehouse design
Creating a strategy for a data warehousing • Identification of current information strategy (usually an implicit strategy) • Internal appraisal of the strategy • External appraisal of the strategy • Perform a strategic gap analysis • Enumerate strategic alternatives which can be achieved through the application of a DW • Choose among alternatives on the basis of the organisations mission, objectives and resources • Formulate a strategic statement which specifies the organisation’s utilisation of its information resources and the role the DW is to play in this utilisation
Building a business case for a data warehouse • Describe the AS-IS situation • Identify business goals that the DW will help to fulfil • Identify business problems that the DW can be used to solve • Describe the TO-BE situation • Explain how the data warehouse will be used to evolve the organisation from the AS-IS to the TO-BE situation • Calculate the expected ROI for the data warehouse • identify all costs associated with the data warehouse • quantify all the benefits between the AS-IS and TO-BE situation
Creating a project plan • What is the project scope? • determine the scope of the data • determine the role of technology • determine any temporal considerations • What is the business reason? • identify the essential purpose of the project • identify business drivers • What are the critical success factors for the project? • identify critical objectives • identify critical tasks and activities • What are the resource constraints on the project? • Determine the need for resources • determine the resource availability
Output from the planning phase • A set of activities to be performed and a set of requirements on these activities (performance metrics) • Documented business drivers • Definition of scope of data • Defined temporal scope • Business reasons • The overall approach • Participants and their roles • Assumptions and constraints • Project management strategy
Data warehouse development activities Architecture definition Decision maker needs Source system analysis Transform design Physical database design Warehouse development Data modelling End-user access development End-user access design Subject area analysis Planning and project initiation Warehouse populate and implement End-user access definition
Project delivery tactics • First things first • Market the project • Adopt a customer-focused orientation • Deliver everything well Note: the project must always be able to show progress and have the ability to deliver business value
Focus areas for the management of a data warehouse • Monitor and manage data warehouse activity • Monitor and manage data warehouse data • Monitor and manage security in the data warehouse • Monitor and manage the data warehouse data model • Monitoring and managing data warehouse metadata • Monitoring and managing the integration and transformation interface • Monitoring and managing the demands of the data warehouse’s business environment
Staffing requirements for initial data warehouse development and subsequent DW management • Data warehouse management and maintenance • Data warehouse administrator • Data warehouse organisational change manager • Database administrator • Data warehouse maintenance developers • Metadata Manager • Analysis and design • Business requirements analysts • User groups • Data warehouse architect • Data procurement • Data quality analyst • Data acquisition developer • Data access developer • IS executive sponsor
Data warehouse end-user roles and responsibilities • Support roles • Iteration sponsors • Subject matter experts • User support technician • User types • Unlimited ad hoc user access users • Limited ad hoc access users • Predefined application users • Data warehouse initial and ongoing end-user staffing • Subject matter experts • User support technician
Data Quality (1) • Data should be accurate • Data should be stored according to data type • Data should have high integrity • Data should be consistent • Databases should be well designed • Data should not be redundant • Data should follow business rules • Data should correspond to established domains
Data quality (2) • Data should be timely • Data should be well understood • Data should be integrated • Data should satisfy the needs of the business • Users should be satisfied with the data and the information derived from the data • Data should be complete • There should be no duplicate records • There should be no data anomalies
A four-phase process to achieve high data quality • Data investigation • parsing • lexical analysis • pattern investigation • data typing • Data conditioning and standardisation • Data integration • Data Survivorship and formatting
Fundamental types of data transformation • Simple transformation • data type conversion • date/time format conversions • field decoding • Cleansing and scrubbing • valid values • complex reformatting • Integration • simple field level mappings • complex integration • Aggregation and summarisation
Other transformations • Operating system conversions • Hardware architecture conversions • affects the structure of data • affects the structure of programs running against the data • affects the computer operations needed for each environment • the available software which makes the different environments run • Application conversions
Types of source system extracts • Point-in-time snapshots • scheduled at specific points in time • efficient method for users to pinpoint specific points in time or ranges of time • unfortunately requires nearly a complete read of all operational sources of data • Significant business events • non-predetermined events drive the capture of data • captured as a snapshot of relevant data entities • triggered when a completion event is performed • Delta data • see next slide...
Types of source system extracts - Delta data Delta data is both new and changed data, it represents changes from one point in time to the next • Delta data can be captured in a number of ways: • Operational events • Changed data capture • date last modified • Point-in-time comparisons
Types of data warehouse updates • Insert • Full replace • Partial replace • Update • Update plus insert • Insert with update • Replace and insert
Copy management • What to extract? • When to extract? • How to extract? • Transformation requirements? • What to transform? • How to transform? • What to update? • When to update? • How to update? • How to generate the necessary metadata?
Major questions affecting the choice of the technical solution • Connectivity and interoperability? • Need for parallel processing ability? • Scalability? • Standards? • Single vendor/multi-vendor • Vendor stability and service? • In house competency? • Compatibility with existing systems architecture? • Compatability with IT strategy? • Functionality vs Cost?