1 / 26

DW (VII): Technical Construction of the schema at the Walmart DW

Explore the design and loading process of the Walmart data warehouse schema, including the star/snowflake design, POS fact table, dimensions, and loading strategies. Learn about data compression, maintenance processes, parallel technology, and lessons from Walmart's DW implementation.

michaelgood
Download Presentation

DW (VII): Technical Construction of the schema at the Walmart DW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DW (VII): Technical Construction of the schema at the Walmart DW Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari

  2. Outline • The Walmart DW star/snowflake design • Parallel technology in loading the schema

  3. Walmart DW • POS (point-of-sale) fact table • Dimensions • Store • Selling activities • Article (product) • Time (week) • Day of week quantity stored as facts

  4. Product Time POS facts Store Selling Activities

  5. Walmart DW • Grain=? • E.g. Number of articles sold on a specific day by a specific store under a specific selling mode (e.g. discount or not)

  6. Walmart DW • “Compression on zero” • SQL count across all combinations • Compress the “0” = not store info • E.g. At Walmart Ponte Hill, no Spectrum Shampoo was sold on 2/6/01

  7. Fact & dimension tables • Fact table contains: • Selling units quantity • Selling amount • Selling cost • Monday unit quantity ….

  8. Walmart DW • Some tables are too wide • Solution: normalize • Snow-flake schema

  9. Department Time Product POS facts Supplier Store Selling Activities

  10. Snow-flake schema • Dimension tables • Article > department > supplier • Department >> class >>… • (detailed table from Westerman) • Store > district

  11. Loading the DW schema • How often to update/load? • Weekly, daily, … • Walmart has stores in different time zones • Now process by small groups of store during the day, at a variable time

  12. Loading the DW schema • Estimate of Walmart’s POS load • Everyday each store sells ~15,000 unique articles • ~4,000 stores • Total = 4,000x15,000=60 million records • Other loading • E.g., Purchase order, inventory

  13. Loading the DW schema • Four types of DW maintenance /loading processes • Initial loading = load into empty table • Append process = insert to existing table • Update process = update records • Delete process = remove records from DW

  14. Loading the DW schema • Initial loading • First time loading • Flush table and reload • E.g., store description file for a retailer • Small file, each retailer has <500 stores • More effective than update

  15. Loading the DW Schema • Append load process • E.g., POS • Keep 65 weeks (515 days) of data • One logical file but 515 partitions • Each night load into one partition • Fast loading via parallel technology • Trick: load into a temp empty table first • Local update

  16. Loading the DW schema • Update processes • Most difficult to build • If table is small, should use initial load • Required for large table that must be updated • E.g., Purchase order (PO) • Long life cycle but need to be up-to-date for query • One option = delete + insert record

  17. Loading the DW schema (delete process) Logical POS Fact table This week sales Week -64 Week -66 Week -65 Last week sales Week -2

  18. Parallel technology • Parallel technology essential in the successful implementation of DW • Large volume of data • Time requirement (e.g. loading when operation is 24/7) • Analytical requirements of DW, unpredictable queries etc. • Parallel H/W, parallel O/S, parallel software

  19. Parallel Computing In DW (DB layer) Parallel hardware Parallel software SMP MPP Divide tasks Divide data Architecture Shared everything Shared something Shared nothing

  20. Hardware • Parallel processing • Symmetric multiprocessing (SMP) • Massively parallel processing (MPP) • Clustering • Hybrid approach (NUMA, nonuniform memory access)

  21. SMP One node Many processor in one node Scale up by adding CPUs or Clustering Most commercial computers are SMP with a dozen or so processors Not highly scalable CPU1 CPU2 CPU3 Disk Memory

  22. MPP Many nodes One/more processors per node Scale up by adding nodes Almost linear scale up Memory Disk CPU Memory Disk CPU

  23. Parallel technology • MPP begins to use commodity CPU • First commerically available in NCR’s WorldMark servers, now MPP gaining popularity • High throughput, low response time • Rarely commercially available in other apps besides RDBMS

  24. Walmart DW • Some lessons: • Star/snow flake design 1) Analyze business processes 2) One process, one star • E.g. ordering, shipping • Inventory, POS, purchase order 3) Different grains (in the same process), different fact tables

  25. Walmart DW • Some lessons: • If you logically design the DB first, the physical implementation will be much easier to maintain in the long term • Use advanced technology to address business requirements (e.g., MPP) • Plan ahead for DW evolution • The appetite for information gets higher  expect people to ask for more and plan ahead

  26. glossary • POS : Point Of Sale • SQL : Structured Query language • PO : purchase order • SMP: Symmetric Multiporcessing • MPP: Massively parallel Processing • NUMA: Non Uniform Memory Access

More Related