250 likes | 262 Views
This presentation by Jeff Christen, a Data Warehousing DBA at Cornell University, focuses on the challenges and solutions of managing a data warehousing infrastructure. Topics include data availability, load management, security, and integration.
E N D
Cornell University’s Data Warehousing Infrastructure Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu
DBA Lead - Data Warehousing Ten Years experience as a DBA (Oracle & Informix) Last Four Years Focused on Data Warehouse DBA Support Team Interface to other IT groups Team Interface to University Jeff Christen
Production Support – 24x7 (Database & Load) Performance Monitoring & Tuning Database Backup & Recovery Security Implementation Object & Code Migrations (Dev / Test / Prod) Infrastructure Development & Maintenance Enforcement of University Policies & Practices Integration of Models & ETL into Warehouse Environment Assist in Data Modeling & ETL Development Responsibilities of a DW DBA
Twelve production data marts Twenty-five unique loads Variety of sources (mainframe, PeopleSoft) Varied load frequencies (daily, weekly, monthly) Varied load requirements (full, partial, append) Rapidly shrinking load windows Multiple server & O.S. environment Cornell’s Warehousing Challenges
DMTools is a Data Warehousing infrastructure management tool developed and in use by Cornell University. Allows high data availability- 24x7 access Repository driven Manages loads Toolbox written in Oracle PL/SQL (O.S. independent) GUI console to manage load related metadata DMTools as a Solution
Two copies of each data mart table are maintained A current table and a backup or work table Data mart users only “see” the current table New data is loaded into “work” table Rename tables rename person to person_b rename person_w to person High Data AvailabilityTable Renaming Process
High Data AvailabilityTable Renaming Benefits • Instant access to new data • Rename does not interrupt users • “Backup” table for previous load’s data • Ability to instantly roll back load
Similar to table rename May exchange table partitions without disrupting users Maintain full partitioned copy of table & small non-partitioned copy Data loaded into non-partitioned table, then exchanged with appropriate partition High Data AvailabilityPartition Exchange
Move security between table copies Remove / add policy to table during exchange Constraint management Manage index names between table copies (A or B suffix to guarantee unique names) Step verification & error handling High Data AvailabilityAdditional Considerations
Current Warehousing EnvironmentInfrastructure Production Windows Server Load (pull) from Replication Database Production AIX Server H/R & Payroll Production AIX Server Transformation & Staging Logging & Metadata Logging & Metadata DMTools Repository Production Windows Server Logging & Metadata Load (push) from Staging Database Contributor Relations Replication Database (PeopleSoft) Production AIX Server Load (pull) from Replication Database Consolidated DataMarts Logging & Metadata Production Windows Server Load (pull) from Replication Database Student Administration Logging & Metadata
DMTools Repository Data Model DM_TAB_PART_COUNTS DM_IND_COLUMNS DM_RUN_DETAILS DM_INDEXES DM_ROW_COUNTS DM_RUNS DM_TABLES DM_IND_PARTITIONS DM_NAMES DM_POLICIES DM_CONSTRAINTS DM_EMAIL_CONTENT DM_GRANTS DM_CONS_COLUMNS DM_ADMIN_USERS DM_EMAIL_SUBSCRIPTIONS
Resides in a centralized database Contains metadata needed for load process Contains logging data related to load process & load metrics Holds notification information Scalable (may use for multiple data marts) DMTools Repository
High level logging Start & end times for pre load, load, post load Load status ( completed, running, failed ) Snapshot date Estimated completion time Record counts (by table & partition) Low level logging Start /end time & description for every action Captures Oracle errors & SQL executing Load Logging
Predefined views Status views (high level load info for given DM) Detail views (used by DBA for trouble shooting) Load metrics views (various object counts related to loads) Flexibility to build additional views on logging tables as needed Load Status Monitoring
Set of Oracle stored procedures used to perform common load tasks Resides on each database in the DMTools environment Procedures bundled in packages related to their function dmrunlog dmidxpack …(partial list only)… DMTools Toolbox
Unique to each data mart Not owned by DMTools (data mart schema) Templates for easy starting point Modify template to accommodate Preload, load, post load Main Procedure
GUI interface to the repository Metadata maintenance Adding/removing tables, indexes, grants, etc. Migration management Email/pager subscription & notification maintenance Warnings & reports Detect potential load problems prior to load DMTools Console
High data availability (24x7 access) Instant rollback to pre-load state Centralized logging and notification Metadata driven O.S. independent (Oracle RDBMS only) Works with various ETL tools Simple setup and maintenance Very scaleable DMTools Advantages
Cornell University’s Data Warehousing Infrastructure INTEREST ? // QUESTIONS ? Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu