260 likes | 745 Views
Walmart Recruiting – Store Sales Forecasting. Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran , Goutham. The dataset we choose. <Features> · Store - the store number · Date - the week · Temperature - average temperature in the region · Fuel Price - cost of fuel in the region
E N D
WalmartRecruiting – Store Sales Forecasting Chaoyi Liu, Yuqing Lu, HaoranWu Rajendran, Goutham
The dataset we choose • <Features> • ·Store - the store number • ·Date - the week • ·Temperature - average temperature in the region • ·Fuel Price - cost of fuel in the region • ·MarkDown1-5 - data is only available after Nov 2011, • ·CPI - the consumer price index • ·Unemployment - the unemployment rate • ·IsHoliday - whether the week is a special holiday week • <Feature> provided parameters may affect weekly sales, but did not provide weekly sales. • <Train> • Store - the store number • ·Dept - the department number • ·Date - the week • ·Weekly Sales - sales for the given department in the given store • ·IsHoliday - whether the week is a special holiday week • <Train> provided sales data of 45 stores with up to 99 departments in more than 421,000 records, and didn’t sum each store’s weekly sales up.
Then we integrated two datasets • So initially, we integrated these two massive tables into one that has everything we need with 6,435 records like this: • Store • Date • Temperature • Fuel_Price • MarkDown1-5 • CPI • Unemployment • IsHoliday • Weekly_Sales We decide to divide the whole 6,435 records equally into 5 groups each contain 1,287 records by quinquesection from small to big like this:
Neural Network Model • It is for complicated prediction problems • Visualization or understanding of the rules are not needed • Accuracy is very important
Result Learning Rate / Training Cycles = 0.03/2000 Accuracy = 70.61% It is easy to find out that Accuracy achieve 70.61% when Learning Rate is 0.03 and will increase as well as Training Cycles increasing
Neural Network Weights Hidden Layer:
Naïve Bayes Accuracy = 18.63% Why Naïve Bayes performances “idiot” on this sample? Because variable Store, Data to IsHoliday are independent on each other, so: P(Store,Date,Temperature, … & IsHoliday)=P(Store)*P(Date)*…..*P(IsHoliday) Due to so many numbers in columns Store , Date, … IsHoliday that do not repeat , the probability of each Variables is too small. So P(Store)*P(Date)*…..*P(IsHoliday) will be far lower than 1/6435. This means the probability of sales basing such a model is infeasible.
K-NN When K = 1, Accuracy = 26% When K = 10, Accuracy = 29.03%
Conclusion • MarkDown 1 to 5 has the highest weight as 16 which mean it really makes an enormous impact on the sales. Promotion will increase weekly sales remarkably. • Fuel price and temperature also makes a positive impact, higher price makes higher sales. • CPI and Unemployment rate having a heavy negative impact on the prospects of sales. The higher CPI and unemployment rate, the less weekly sales. • Holidays affect weekly sales slightly. I think customers don’t care whether today is holiday or not, the only reason they buy items is promotion.