1 / 25

Data Warehousing Infrastructure Management

This presentation by Jeff Christen, a Data Warehousing DBA at Cornell University, focuses on the challenges and solutions of managing a data warehousing infrastructure. Topics include data availability, load management, security, and integration.

dannym
Download Presentation

Data Warehousing Infrastructure Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cornell University’s Data Warehousing Infrastructure Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu

  2. DBA Lead - Data Warehousing Ten Years experience as a DBA (Oracle & Informix) Last Four Years Focused on Data Warehouse DBA Support Team Interface to other IT groups Team Interface to University Jeff Christen

  3. Production Support – 24x7 (Database & Load) Performance Monitoring & Tuning Database Backup & Recovery Security Implementation Object & Code Migrations (Dev / Test / Prod) Infrastructure Development & Maintenance Enforcement of University Policies & Practices Integration of Models & ETL into Warehouse Environment Assist in Data Modeling & ETL Development Responsibilities of a DW DBA

  4. Twelve production data marts Twenty-five unique loads Variety of sources (mainframe, PeopleSoft) Varied load frequencies (daily, weekly, monthly) Varied load requirements (full, partial, append) Rapidly shrinking load windows Multiple server & O.S. environment Cornell’s Warehousing Challenges

  5. DMTools is a Data Warehousing infrastructure management tool developed and in use by Cornell University. Allows high data availability- 24x7 access Repository driven Manages loads Toolbox written in Oracle PL/SQL (O.S. independent) GUI console to manage load related metadata DMTools as a Solution

  6. Two copies of each data mart table are maintained A current table and a backup or work table Data mart users only “see” the current table New data is loaded into “work” table Rename tables rename person to person_b rename person_w to person High Data AvailabilityTable Renaming Process

  7. High Data AvailabilityTable Renaming Benefits • Instant access to new data • Rename does not interrupt users • “Backup” table for previous load’s data • Ability to instantly roll back load

  8. High Data AvailabilityTable Renaming Example

  9. Similar to table rename May exchange table partitions without disrupting users Maintain full partitioned copy of table & small non-partitioned copy Data loaded into non-partitioned table, then exchanged with appropriate partition High Data AvailabilityPartition Exchange

  10. High Data AvailabilityPartition Exchange Example

  11. Move security between table copies Remove / add policy to table during exchange Constraint management Manage index names between table copies (A or B suffix to guarantee unique names) Step verification & error handling High Data AvailabilityAdditional Considerations

  12. Current Warehousing EnvironmentInfrastructure Production Windows Server Load (pull) from Replication Database Production AIX Server H/R & Payroll Production AIX Server Transformation & Staging Logging & Metadata Logging & Metadata DMTools Repository Production Windows Server Logging & Metadata Load (push) from Staging Database Contributor Relations Replication Database (PeopleSoft) Production AIX Server Load (pull) from Replication Database Consolidated DataMarts Logging & Metadata Production Windows Server Load (pull) from Replication Database Student Administration Logging & Metadata

  13. DMTools Repository Data Model DM_TAB_PART_COUNTS DM_IND_COLUMNS DM_RUN_DETAILS DM_INDEXES DM_ROW_COUNTS DM_RUNS DM_TABLES DM_IND_PARTITIONS DM_NAMES DM_POLICIES DM_CONSTRAINTS DM_EMAIL_CONTENT DM_GRANTS DM_CONS_COLUMNS DM_ADMIN_USERS DM_EMAIL_SUBSCRIPTIONS

  14. DMTools Repository Data Model(System Catalog)

  15. DMTools Repository Data Model(Logging Metrics)

  16. Resides in a centralized database Contains metadata needed for load process Contains logging data related to load process & load metrics Holds notification information Scalable (may use for multiple data marts) DMTools Repository

  17. High level logging Start & end times for pre load, load, post load Load status ( completed, running, failed ) Snapshot date Estimated completion time Record counts (by table & partition) Low level logging Start /end time & description for every action Captures Oracle errors & SQL executing Load Logging

  18. Predefined views Status views (high level load info for given DM) Detail views (used by DBA for trouble shooting) Load metrics views (various object counts related to loads) Flexibility to build additional views on logging tables as needed Load Status Monitoring

  19. Set of Oracle stored procedures used to perform common load tasks Resides on each database in the DMTools environment Procedures bundled in packages related to their function dmrunlog dmidxpack …(partial list only)… DMTools Toolbox

  20. Unique to each data mart Not owned by DMTools (data mart schema) Templates for easy starting point Modify template to accommodate Preload, load, post load Main Procedure

  21. GUI interface to the repository Metadata maintenance Adding/removing tables, indexes, grants, etc. Migration management Email/pager subscription & notification maintenance Warnings & reports Detect potential load problems prior to load DMTools Console

  22. DMTools Console - GUI Interface (main)

  23. DMTools Console - GUI Interface (add Table)

  24. High data availability (24x7 access) Instant rollback to pre-load state Centralized logging and notification Metadata driven O.S. independent (Oracle RDBMS only) Works with various ETL tools Simple setup and maintenance Very scaleable DMTools Advantages

  25. Cornell University’s Data Warehousing Infrastructure INTEREST ? // QUESTIONS ? Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu

More Related