1 / 23

Clustering Seasonality Patterns in the Presence of Errors

Clustering Seasonality Patterns in the Presence of Errors. Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo. Outline. Motivation Objective Introduction Seasonality Estimation Distance Function Experimental results

lemay
Download Presentation

Clustering Seasonality Patterns in the Presence of Errors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Seasonality Patterns in the Presence of Errors Advisor:Dr. Hsu Graduate:You-Cheng Chen Author:Mahesh Kumar Nitin R. Patel Jonathan Woo

  2. Outline • Motivation • Objective • Introduction • Seasonality Estimation • Distance Function • Experimental results • Conclusions • Personal opinion

  3. Motivation Most traditional clustering algorithms assume that the data is provided without measurement error

  4. Objective • To present a clustering method that incorporates information contained in these error estimates and a new distance function that is based on the distribution of errors in data

  5. Introduction • Definition of a good distance or dissimilarity function is a critical step in any distance based clustering method. • Problem:Most traditional clustering methods assume that data is without any error,but errors are natural in any data measurement. • Example:Sample average

  6. Introduction • This study and results are focused on time-series clustering in the retail industry • This study assume that each point comes from a multidimensional Gaussian distribution

  7. Seasonality Estimation (1/4) • Seasonality is defined as the normalized underlying • demand of a group of similar merchandize as a function of • time of the year after taking into account other factors that • impact sales such as discounts,inventory,promotions and • random effects. • Saleit=fI(Iit)*fP(Pit)*fQ(Qit)*fR(Rit)*PLCi(t-ti0)*Seasit (1) • After (1) remove the effects of all these nonseasonal factors • Saleit= PLCi(t-ti0)*Seasit

  8. Seasonality Estimation (2/4) • S is a set of items following similar seasonality • ,therefore, S consists of items having a variety of • PLCs differing in their shape and time duration • Theorem 1:

  9. Seasonality Estimation (3/4) • If we take the average of weekly sales of all items • in S then it would nullify the effect of PLCs as suggested • by the following equations.

  10. Seasonality Estimation (4/4) • Seasonality values,Seast, can be estimated by appropriate • Scaling of weekly sales average, Salet The above procedure provides us with a large number of seasonal patterns, one for each set S, along with estimates of associated errors.

  11. Distance Function(1/4) Consider two seasonalities : Ai={(xi1,σi1),(xi2, σi2),…,(xiT, σiT)} Aj={(xj2, σj2),(xj2, σj2),…,(xjT, σjT)} We define similarity between two seasonalities as follows: If the null hypothesis H0:Ai~Aj is true then similarity between Ai and Aj is the probability of accepting the hypothesis. The distance dij between Ai and Aj is defined as ( 1-similarity) which is the probability of rejecting the H0

  12. Distance Function(2/4) Consider tth samples of both seasonalities Ait=(xit, σit) and Ajt=(xjt, σjt). (xit-xjt) ~ N( uit-ujt, (σ2it+ σ2jt)1/2 ) (1) If Ai~Aj then uit=ujt and consequently the statistic follows a t-distribution.

  13. Distance Function(3/4) • Finally distance • Comparison with Euclidean Distance • dij is monotonically increasing with respect • to

  14. Distance Function(4/4) • Comparison with Euclidean Distance • If all σ’s were the same and equal to σ then • it would become the rank order of (1) which • is the same as the rank order of the • Euclidean distance,(2)

  15. Clustering • Clustering Algorithm

  16. Experimental Results (1/6) Simulated Data Figure 5: Individual(prior to clustering) seasonality estimates with associated errors

  17. Experimental Results (2/6) Figure 6:Seasonalities obtained by hError

  18. Experimental Results (3/6) Figure 7: Seasonalities obtained by kmeans and Ward’s method using Euclidean distances

  19. Experimental Results (4/6) • Table 1:Average # misclassifications and Average • Estimation Error for different clustering methods

  20. Experimental Results (5/6) Table 2: Average Forecast Error (Retailer Data)

  21. Experimental Results (6/6)

  22. Conclusions The distance function dij is invariant under different scales for data and the clustering method obtain better cluster than others.

  23. Personal Opinion The concept of incorporating information about errors in the distance function is very good and can be used in many other clustering applications.

More Related