270 likes | 564 Views
Data Warehouse & Business Intelligence Concepts & Architecture Sanjeev. Data Warehouse & Business Intelligence. Topics To Be Discussed: Why Do We Need A Data Warehouse ? What Exactly Is A Data Warehouse ? Features Of Data Warehouse Sources Of Data Warehouse Data Warehouse Designs
E N D
Data Warehouse & Business Intelligence Concepts & Architecture Sanjeev
Data Warehouse & Business Intelligence • Topics To Be Discussed: • Why Do We Need A Data Warehouse ? • What Exactly Is A Data Warehouse ? • Features Of Data Warehouse • Sources Of Data Warehouse • Data Warehouse Designs • Data Warehouse – Data Usage • Why There Is A Need of Business Intelligence ? • DWH & BI - Architecture • Case Study
Data Warehouse • Why Do We Need A Data Warehouse ? • Data Access Problem – Data In “Jail” • The single key to survival in the 1990s (and beyond) is - being able to analyze, plan and react to changing business conditions in a much more rapid fashion. • To do this, top managers, analysts and knowledge workers in our enterprises need more and better information. • Information technology itself has made possible the revolutions in the way that Organizations today operate throughout the world. • More and more powerful computers on everyone‘s desks, and • Communication networks that span the globe • BUT STILL • Executives and decision makers can't get their hands on critical information that already exists in the organization. • Continued…
Data Warehouse • Why Do We Need A Data Warehouse ? • Data Access Problem – Data In “Jail” • Organizations- large and small, create • billions of bytes of data about all aspects of their business, • millions of individual facts about their customers, products, operations and people • But for the most part, this data is locked up in a myriad of computer systems and is exceedingly difficult to get at. • This phenomenon has been described as "data in jail". • Only a small fraction of the data that is captured, processed and stored in the enterprise is actually available to executives and decision makers. • Technologies for the manipulation and presentation of data have literally exploded. • Large segments of the enterprise are still "data poor.“ • Whatever is BETTER, FASTER and CHEAPER, is not FUNCTIONALLY COMPLETE. • Continued…
Data Warehouse • Why Do We Need A Data Warehouse ? • Data Access Problem – Data In “Jail” • Solution – A Data Warehouse • A set of significant new concepts and tools have evolved providing all the key people in the enterprise with access to whatever level of information needed for the enterprise to survive and prosper in an increasingly competitive world. • The term that has come to characterize this new technology is “Data Warehousing.” • to provide an Organization flexible, effective and efficient means of getting at the sets of data that have come to represent one of the Organization‘s most critical and valuable assets. • To make sure that the enterprise-wise information should be available for decision making purpose at all levels, at any point of time.
Data Warehouse What Exactly Is A Data Warehouse ? • A Data Warehouse is a special kind of database, which stores • SUBJECT ORIENTED, • INTEGRATED, • TIME VARIANT, • NON-VOLATILE • collection of data in support of management’s decision making process. • It is a structured repository of historic data of the Organization which support managerial decision making. • It is developed in an evolutionary process by integrating data from non-integrated legacy systems. • Many design elements that optimize transaction processing are inefficient (in several ways) in a data warehouse. • Managerial access to data for decision making requires access mechanisms that would violate many principles of regular DB design, like Normalization, Security, Integrity, etc. • Continued…
Data Warehouse - Features Key Features of Data Warehouse : Create Read ODS Insert Replace Cust Prod Update Delete Order 1999 1998 2000 Load Read DWH DWH Load Read
Data Warehouse - Sources Sources of Data Warehouse Data DWH Archives Historic Data Current system of records Recent History Operational Transactions Future Data Source
Data Warehouse Designs Various Data Warehouse Designs
Data Warehouse - Data Nature of Data Warehouse Data • Data in a DWH is always historic and is static in nature. • It is used to look at the information over periods of time. • It is usually built from the operational data available in the Organization. • The data may not necessarily be from with-in the Organization. Appropriate Use of Data Warehouse Data • Produce Reports For Long Term Trend analysis • Produce Reports Aggregating Enterprise Data • Produce Reports of Multiple Dimensions (Earned revenue by month by product by branch) Inappropriate Use of Data Warehouse Data • Replace Operational Systems • Replace Operational Systems’ Reports • Analyze Current Operational Results
Data Warehouse & Business Intelligence SQL Is Inadequate – Need of B.I. • SQL is inadequate for analytical applications due to the following reasons: • The conditions in WHERE clause often contains too many AND , OR conditions. • OR conditions are poorly handled by most RDBMS. • Statistical functions such as standard deviations are not supported by SQL. • Aggregation over time is not supported. • Users often need to pose related queries to get the desired results. There is no convenient way to express commonly occurring groups of queries. • Most of the times, the SQL queries are not optimized and hence take lot of time to produce results. • Many business operations are hard or impossible to express in SQL • Comparisons (with aggregation) • Multiple Aggregations • Reporting features • To overcome the above limitations, some business intelligence is required over the available data using a separate set of tools, which can help in doing all kind of required analysis and generating reports. • Continued…
Data Warehouse & Business Intelligence DWH & Business Intelligence • To overcome the limitations of SQL, Business Intelligence is required for the following tasks: • Data Integration – extracting the data from different heterogeneous sources and store it in a consistent format at one location (DWH). • Data Transformation– transform the data in the required format before loading, so that data can be maintained in a consistent format. • Data Marts, Multi-dimensional databases and cubes – creating data-marts, multi-dimensional databases and cubes, which will act as a source for various reports and trend-analysis. • Data Access & Analysis – make the data available to the end-users in the form of reports and dash-boards and help the data analyst to do different kind of analysis to support the decision making at the senior management level.
Data Warehouse & BI Architecture End-To-End Data Warehouse & BI Architecture Data Sources Extraction Staging Load Central DWH Data Marts Data Access & Analysis
Data Warehouse - Case Study TheProblem - Data in ERP “Jail” – Virginia Tech University • Data structures difficult to understand and inefficient to access for analysis and reports • Data values change so point-in-time data lost • Growing backlog of report requests TheSolution – A DWH for ERP Data – Virginia Tech University • Initial charge – Build a data warehouse • Initial vision – Create business view of administrative data for Virginia Tech USER DATA WAREHOUSE TRANSACTIONAL ERP SYSTEM Data Access Architecture
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Laying The Foundation • Planning • Surveyed other institutions • Did site visits and interviews • Established scope • Identified first subject area • Drafted project plan • Delivered management briefings • Staff Education and Training • Data Warehouse Institute • Ralph Kimball Approach to design DWH • Staffing • DBA • Data Administrator • Data Warehouse Architects • Training Coordinators • Web Application Developers • Other Resources • Hardware • Software
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Building The Data Warehouse • Strategy • Build by subject area • Develop iteratively • Design for enterprise • Design • Star Schema • Time Dimension • Transaction Detail • Surrogate Keys • Conformed Dimensions • Slowly Changing Dimensions The Design – Multi Dimensional
Data Warehouse - Case Study TheSolution – A DWH for ERP Data The Design – STAR Schema FACT TABLE DIMENSIONS DIMENSIONS
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Confirmed Dimensions
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Design - Slowly Changing Data (Dimensions) • There are 3 ways to manage the change in the slowing changing dimension data: • Overwrite changed attribute in the same record • add new record for the new value • use additional fields for old and new values in the same record • Proper standards should be followed while designing the DWH • Object names should be meaningful and standardized • Indicators should be used to simplify the queries • Descriptions should be provided along with each piece of code • Data should be available with business descriptions to make it clear to the end-users • Special Features • External data may be included • Derivations, calculations, aggregations and summary data should be included
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Building The Data Warehouse • Development Process • Data Model Design (Erwin) • Source-To-Target mapping • Business Definitions • ETL Development / Testing (Data Stage) • Data Verification • Process Control Checks • Pilot User Training • Data Access Strategy • Stewardship same as ERP • ERP security definitions leveraged • Warehouse security built as part of ETL • Training precedes access
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Building The Data Warehouse
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Query Example Create a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties. Result - ERP Query select spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_desc from spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenr where pebempl_empl_status = 'A’ and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code
Data Warehouse - Case Study TheSolution – A DWH for ERP Data Query Example Create a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties. Result - DWH Query select ssn_fin_num, current_full_name, salary_hire_date, gender_desc, ethnicity_desc, rank_desc, tenure_desc from employee where current_record_ind = 'Y’ and active_employee_ind = 'Y‘ and faculty_ind = 'Y’ and full_time_ind = 'Y’ and (gender_code = 'F' or ethnicity_code != '1')
Data Warehouse - Case Study TheSolution – A DWH for ERP Data • DWH – Design Features • DWH Metadata System has Business definitions maintained by Data Experts, which are stored with the data • Data Architecture is structured for query • Data can be accessed by various clients • DWH is designed to include historic data • DWH provides a stable business view of data