370 likes | 503 Views
Enterprise Data in Jail. A Problem with a Solution. Betsy Blythe and Lore Balkan Virginia Tech. 0010111001. The Problem. Data in “ERP Jail”. Data structures difficult to understand and inefficient to access for analysis and reports Data values change so point-in-time data lost
E N D
Enterprise Data in Jail A Problem with a Solution Betsy Blythe and Lore Balkan Virginia Tech
0010111001 The Problem • Data in “ERP Jail” • Data structures difficult to understand and inefficient to access for analysis and reports • Data values change so point-in-time data lost • Growing backlog of report requests
The Solution • Initial charge – Build a data warehouse • Initial vision – Create business view of administrative data for Virginia Tech
The Solution A Data Access Architecture User Data Warehouse Transactional ERP System
Laying the Foundation • Staffing • DBA • Data Administrator • Data Warehouse Architects • Training Coordinators • Web Application Developers • Other Resources • Hardware • Software
Laying the Foundation • Planning • Surveyed other institutions • Did site visits and interviews • Established scope • Identified first subject area • Drafted project plan • Delivered management briefings
Laying the Foundation • Staff Education and Training • Data Warehouse Institute • Ralph Kimball
Getting Started • Focused on Finance • Delivered Finance Reports for ERP • Learned Finance data • Built relationships and trust • Evolved a shared vision for warehouse
Building the Data Warehouse • Strategy • Build by subject area • Develop iteratively • Design for enterprise
Building the Data Warehouse • Design • Star Schema • Time Dimension • Transaction Detail • Surrogate Keys • Conformed Dimensions • Slowly Changing Dimensions
The Design: Multidimensional Facts Dimensions
The Design: Dimensions
The Design: Facts
The Design: Star Schema
Conformed Dimensions Employee Dimension Time Dimension Account Dimension Payroll Facts Finance Facts Earnings Type Dimension Org Dimension Fund Dimension
The Design Managing Change … Slowly Changing Dimensions • 3 Techniques • overwrite changed attribute • add new dimension record • use field for ‘old’ value
The Design • Standards • Names meaningful and standardized • Indicators simplify queries • Code descriptions stored with codes • Business descriptions available with data
The Design • Special Features • External data may be included • Derivations and calculations included • Summary and aggregations may be included • History is built by design
Building the Data Warehouse • Project Agreement • Signed “blueprint” for the data mart • Explains sponsorship and roles • Details data requirements • Identifies development team • Identifies pilot users • Lists key tasks and dependencies
Building the Data Warehouse • Data Mart Development Team * • 2 IWA developers • Functional area technical expert • Functional area business/data expert • Functional area key user* Meets Weekly– Meeting Minutes document the process
Building the Data Warehouse • Development Process • Data model design (ERwin) • Source-to-target mapping • Business definitions • ETL development / testing (DataStage)
Building the Data Warehouse • Development Process • Data verification • Process control checks • Pilot user training
Building the Data Warehouse • Data Access Strategy • Stewardship same as ERP • ERP security definitions leveraged • Warehouse security built as part of ETL • Training precedes access
ERP Data Extract Transform Load Data Warehouse Other Data Sources Process Checks The Result Runs Every Night Ready For Access & Query
1 Warehouse Table EMPLOYEE_STATUS_DIMENSION The Result 41 ERP Tables
The Result: Query Example Provost’s Request: Report showing employee id, name, current hire date, gender, ethnicity, rank and tenure for all full-time minority faculty
The ERP Query select spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_desc from spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenr where pebempl_empl_status = 'A' and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code
The Warehouse Query select ssn_fin_num, current_full_name, salary_hire_date, gender_desc, ethnicity_desc, rank_desc, tenure_desc from employee where current_record_ind = 'Y' and active_employee_ind = 'Y' and faculty_ind = 'Y' and full_time_ind = 'Y' and (gender_code = 'F' or ethnicity_code != '1')
The Result General Person Human Resource Finance Student Alumni Development • Finance • Operating Ledger • General Ledger • Foundation • Accts Receivable • Human Resource • Employee • Job • Job Funding • Position • Position Allocation • Payroll • Alumni • Alumni Giving Futuredata mart
The Result • Metadata System • Business definitions maintained by Data Experts • Business definitions stored with the data • Data models and business definitions on the Web
The Result • The VT Data Warehouse Users • 900 Finance • 400 VT Foundation • 67 HR • 10 Alumni • * See fact sheet handout
The Result • A Data Architecture • Structured for query • Access by any ODBC or Oracle client • Designed to include history • Focus on the user • Provides a stable business view of the data
The Result • Query and Reporting Tools • Web Enabled / Client Server • Metadata stored with the data • Appropriate to the user skill set • Appropriate to the user need
Lessons Learned • Functional area sponsorship is critical • Analysis paralysis can be a problem • The devil is in the details • Let the ERP settle first • Data verification is time consuming • Canned reports sell the warehouse • 24/7 availability is expected • Success breeds demand
Lessons Learned • Don’t lose sight of the reasons for creating the data warehouse • Empower users to become self-sufficient • Prevent users from impacting the production system • Reduce interrupt-driven information requests to IS • Summarize data for trend analysis and data retention
Resources • Hardware – SUN E4500 w/ 8 CPUs, 8 GB RAM, 480 GB Disk Space • Software – Solaris 2.7 Oracle 8.1.7 Erwin – data models Ascential DataStage – ETL Brio – web reports, ad hoc query SQR – reporting Perl – metadata interface