370 likes | 384 Views
Explore how Virginia Tech solved the problem of inefficient data structures and access for analysis and reporting through the implementation of a data warehouse.
E N D
Enterprise Data in Jail A Problem with a Solution Betsy Blythe and Lore Balkan Virginia Tech
0010111001 The Problem • Data in “ERP Jail” • Data structures difficult to understand and inefficient to access for analysis and reports • Data values change so point-in-time data lost • Growing backlog of report requests
The Solution • Initial charge – Build a data warehouse • Initial vision – Create business view of administrative data for Virginia Tech
The Solution A Data Access Architecture User Data Warehouse Transactional ERP System
Laying the Foundation • Staffing • DBA • Data Administrator • Data Warehouse Architects • Training Coordinators • Web Application Developers • Other Resources • Hardware • Software
Laying the Foundation • Planning • Surveyed other institutions • Did site visits and interviews • Established scope • Identified first subject area • Drafted project plan • Delivered management briefings
Laying the Foundation • Staff Education and Training • Data Warehouse Institute • Ralph Kimball
Getting Started • Focused on Finance • Delivered Finance Reports for ERP • Learned Finance data • Built relationships and trust • Evolved a shared vision for warehouse
Building the Data Warehouse • Strategy • Build by subject area • Develop iteratively • Design for enterprise
Building the Data Warehouse • Design • Star Schema • Time Dimension • Transaction Detail • Surrogate Keys • Conformed Dimensions • Slowly Changing Dimensions
The Design: Multidimensional Facts Dimensions
The Design: Dimensions
The Design: Facts
The Design: Star Schema
Conformed Dimensions Employee Dimension Time Dimension Account Dimension Payroll Facts Finance Facts Earnings Type Dimension Org Dimension Fund Dimension
The Design Managing Change … Slowly Changing Dimensions • 3 Techniques • overwrite changed attribute • add new dimension record • use field for ‘old’ value
The Design • Standards • Names meaningful and standardized • Indicators simplify queries • Code descriptions stored with codes • Business descriptions available with data
The Design • Special Features • External data may be included • Derivations and calculations included • Summary and aggregations may be included • History is built by design
Building the Data Warehouse • Project Agreement • Signed “blueprint” for the data mart • Explains sponsorship and roles • Details data requirements • Identifies development team • Identifies pilot users • Lists key tasks and dependencies
Building the Data Warehouse • Data Mart Development Team * • 2 IWA developers • Functional area technical expert • Functional area business/data expert • Functional area key user* Meets Weekly– Meeting Minutes document the process
Building the Data Warehouse • Development Process • Data model design (ERwin) • Source-to-target mapping • Business definitions • ETL development / testing (DataStage)
Building the Data Warehouse • Development Process • Data verification • Process control checks • Pilot user training
Building the Data Warehouse • Data Access Strategy • Stewardship same as ERP • ERP security definitions leveraged • Warehouse security built as part of ETL • Training precedes access
ERP Data Extract Transform Load Data Warehouse Other Data Sources Process Checks The Result Runs Every Night Ready For Access & Query
1 Warehouse Table EMPLOYEE_STATUS_DIMENSION The Result 41 ERP Tables
The Result: Query Example Provost’s Request: Report showing employee id, name, current hire date, gender, ethnicity, rank and tenure for all full-time minority faculty
The ERP Query select spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_desc from spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenr where pebempl_empl_status = 'A' and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code
The Warehouse Query select ssn_fin_num, current_full_name, salary_hire_date, gender_desc, ethnicity_desc, rank_desc, tenure_desc from employee where current_record_ind = 'Y' and active_employee_ind = 'Y' and faculty_ind = 'Y' and full_time_ind = 'Y' and (gender_code = 'F' or ethnicity_code != '1')
The Result General Person Human Resource Finance Student Alumni Development • Finance • Operating Ledger • General Ledger • Foundation • Accts Receivable • Human Resource • Employee • Job • Job Funding • Position • Position Allocation • Payroll • Alumni • Alumni Giving Futuredata mart
The Result • Metadata System • Business definitions maintained by Data Experts • Business definitions stored with the data • Data models and business definitions on the Web
The Result • The VT Data Warehouse Users • 900 Finance • 400 VT Foundation • 67 HR • 10 Alumni • * See fact sheet handout
The Result • A Data Architecture • Structured for query • Access by any ODBC or Oracle client • Designed to include history • Focus on the user • Provides a stable business view of the data
The Result • Query and Reporting Tools • Web Enabled / Client Server • Metadata stored with the data • Appropriate to the user skill set • Appropriate to the user need
Lessons Learned • Functional area sponsorship is critical • Analysis paralysis can be a problem • The devil is in the details • Let the ERP settle first • Data verification is time consuming • Canned reports sell the warehouse • 24/7 availability is expected • Success breeds demand
Lessons Learned • Don’t lose sight of the reasons for creating the data warehouse • Empower users to become self-sufficient • Prevent users from impacting the production system • Reduce interrupt-driven information requests to IS • Summarize data for trend analysis and data retention
Resources • Hardware – SUN E4500 w/ 8 CPUs, 8 GB RAM, 480 GB Disk Space • Software – Solaris 2.7 Oracle 8.1.7 Erwin – data models Ascential DataStage – ETL Brio – web reports, ad hoc query SQR – reporting Perl – metadata interface