260 likes | 273 Views
Explore the design and loading process of the Walmart data warehouse schema, including the star/snowflake design, POS fact table, dimensions, and loading strategies. Learn about data compression, maintenance processes, parallel technology, and lessons from Walmart's DW implementation.
E N D
DW (VII): Technical Construction of the schema at the Walmart DW Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari
Outline • The Walmart DW star/snowflake design • Parallel technology in loading the schema
Walmart DW • POS (point-of-sale) fact table • Dimensions • Store • Selling activities • Article (product) • Time (week) • Day of week quantity stored as facts
Product Time POS facts Store Selling Activities
Walmart DW • Grain=? • E.g. Number of articles sold on a specific day by a specific store under a specific selling mode (e.g. discount or not)
Walmart DW • “Compression on zero” • SQL count across all combinations • Compress the “0” = not store info • E.g. At Walmart Ponte Hill, no Spectrum Shampoo was sold on 2/6/01
Fact & dimension tables • Fact table contains: • Selling units quantity • Selling amount • Selling cost • Monday unit quantity ….
Walmart DW • Some tables are too wide • Solution: normalize • Snow-flake schema
Department Time Product POS facts Supplier Store Selling Activities
Snow-flake schema • Dimension tables • Article > department > supplier • Department >> class >>… • (detailed table from Westerman) • Store > district
Loading the DW schema • How often to update/load? • Weekly, daily, … • Walmart has stores in different time zones • Now process by small groups of store during the day, at a variable time
Loading the DW schema • Estimate of Walmart’s POS load • Everyday each store sells ~15,000 unique articles • ~4,000 stores • Total = 4,000x15,000=60 million records • Other loading • E.g., Purchase order, inventory
Loading the DW schema • Four types of DW maintenance /loading processes • Initial loading = load into empty table • Append process = insert to existing table • Update process = update records • Delete process = remove records from DW
Loading the DW schema • Initial loading • First time loading • Flush table and reload • E.g., store description file for a retailer • Small file, each retailer has <500 stores • More effective than update
Loading the DW Schema • Append load process • E.g., POS • Keep 65 weeks (515 days) of data • One logical file but 515 partitions • Each night load into one partition • Fast loading via parallel technology • Trick: load into a temp empty table first • Local update
Loading the DW schema • Update processes • Most difficult to build • If table is small, should use initial load • Required for large table that must be updated • E.g., Purchase order (PO) • Long life cycle but need to be up-to-date for query • One option = delete + insert record
Loading the DW schema (delete process) Logical POS Fact table This week sales Week -64 Week -66 Week -65 Last week sales Week -2
Parallel technology • Parallel technology essential in the successful implementation of DW • Large volume of data • Time requirement (e.g. loading when operation is 24/7) • Analytical requirements of DW, unpredictable queries etc. • Parallel H/W, parallel O/S, parallel software
Parallel Computing In DW (DB layer) Parallel hardware Parallel software SMP MPP Divide tasks Divide data Architecture Shared everything Shared something Shared nothing
Hardware • Parallel processing • Symmetric multiprocessing (SMP) • Massively parallel processing (MPP) • Clustering • Hybrid approach (NUMA, nonuniform memory access)
SMP One node Many processor in one node Scale up by adding CPUs or Clustering Most commercial computers are SMP with a dozen or so processors Not highly scalable CPU1 CPU2 CPU3 Disk Memory
MPP Many nodes One/more processors per node Scale up by adding nodes Almost linear scale up Memory Disk CPU Memory Disk CPU
Parallel technology • MPP begins to use commodity CPU • First commerically available in NCR’s WorldMark servers, now MPP gaining popularity • High throughput, low response time • Rarely commercially available in other apps besides RDBMS
Walmart DW • Some lessons: • Star/snow flake design 1) Analyze business processes 2) One process, one star • E.g. ordering, shipping • Inventory, POS, purchase order 3) Different grains (in the same process), different fact tables
Walmart DW • Some lessons: • If you logically design the DB first, the physical implementation will be much easier to maintain in the long term • Use advanced technology to address business requirements (e.g., MPP) • Plan ahead for DW evolution • The appetite for information gets higher expect people to ask for more and plan ahead
glossary • POS : Point Of Sale • SQL : Structured Query language • PO : purchase order • SMP: Symmetric Multiporcessing • MPP: Massively parallel Processing • NUMA: Non Uniform Memory Access