1 / 28

Analyzing Ski Ticket Sales Data for Autocorrelation

Explore how autocorrelation impacts ski ticket sales data analysis, conduct the Durbin-Watson test, and apply remedies for autocorrelation in regression models. Learn the correct approach to incorporate categorical variables in regression analysis.

westling
Download Presentation

Analyzing Ski Ticket Sales Data for Autocorrelation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 112 Notes 17 • Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) • Using and Interpreting Indicator Variables (Chapter 7.1)

  2. Time Series Data and Autocorrelation When Y is a variable collected for the same entity (person, state, country) over time, we call the data time series data. For time series data, we need to consider the independence assumption for the simple and multiple regression model. Independence Assumption: The residuals are independent of one another. This means that if the residual is positive this year, it needs to be equally likely for the residuals to be positive or negative next year, i.e., there is no autocorrelation. Positive autocorrelation: Positive residuals are more likely to be followed by positive residuals than by negative residuals. Negative autocorrelation: Positive residuals are more likely to be followed by negative residuals than by positive residuals.

  3. Ski Ticket Sales Christmas Week is a critical period for most ski resorts. A ski resort in Vermont wanted to determine the effect that weather had on its sale of lift tickets during Christmas week. Data from past 20 years. Yi= lift tickets during Christmas week in year i Xi1=snowfall during Christmas week in year i Xi2= average temperature during Christmas week in year i. Data in skitickets.JMP

  4. Residuals suggest positive autocorrelation

  5. Durbin-Watson Test of Independence The Durbin-Watson test is a test of whether the residuals are independent. The null hypothesis is that the residuals are independent and the alternative hypothesis is that the residuals are not independent (either positively or negatively) autocorrelated. The test works by computing the correlation of consecutive residuals. To compute Durbin-Watson test in JMP, after Fit Model, click the red triangle next to Response, click Row Diagnostics and click Durbin-Watson Test. Then click red triangle next to Durbin-Watson to get p-value. For ski ticket data, p-value = 0.0002. Strong evidence of autocorrelation

  6. Remedies for Autocorrelation Add time variable to the regression. Add lagged dependent (Y) variable to the regression. We can do this by creating a new column and right clicking, then clicking Formula, clicking Row and clicking Lag and then clicking the Y variables. After adding these variables, refit the model and then recheck the Durbin-Watson statistic to see if autocorrelation has been removed.

  7. Example 6.10 in book

  8. Categorical variables • Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). • How to use categorical variables as explanatory variables in regression analysis?

  9. Comparing Toy Factory Managers • An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (Alice, Bob and Carol). Data in toyfactorymanager.JMP. • How do the managers compare? Picture from Toy Story (1995)

  10. Marginal Comparison • Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys. How can we be sure that Carol has not simply been supervising very small production runs? • Solution: Run a multiple regression in which we include size of the production run as an explanatory variable along with manager, in order to control for size of the production run.

  11. Including Categorical Variable in Multiple Regression: Wrong Approach • We could assign codes to the managers, e.g., Alice = 0, Bob=1, Carol=2. • This model says that for the same run size, Bob is 31 minutes faster than Alice and Carol is 31 minutes faster than Bob. • This model restricts the difference between Alice and Bob to be the same as the difference between Bob and Carol – we have no reason to do this. • If we use a different coding for Manager, we get different results, e.g., Bob=0, Alice=1, Carol=2 Alice 5 min. faster than Bob

  12. Including Categorical Variable in Multiple Regression: Right Approach • Create an indicator (dummy) variable for each category. • Manager[Alice] = 1 if Manager is Alice 0 if Manager is not Alice • Manager[Bob] = 1 if Manager is Bob 0 if Manager is not Bob • Manager[Carol] = 1 if Manager is Carol 0 if Manager is not Carol

  13. Categorical Variables in Multiple Regression in JMP • Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal. • Use Fit Model and include the categorical variable into the multiple regression. • After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing coding of the dummy variables).

  14. For a run size of length 100, the estimated time for run of Alice, Bob and Carol • For the same run size, Alice is estimated to be on average 38.41-(-14.65)=53.06 minutes slower than Bob and 38.41-(-23.76)=62.17 minutes slower than Carol.

  15. Election Regression

  16. Prediction and Prediction Interval for 2008

  17. Effect Tests • Effect test for manager: vs. Ha: At least two of manager[Alice], manager[Bob] and manager[Carol] are not equal. Null hypothesis is that all managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. This is a partial F test. • p-value for Effect Test <.0001. Strong evidence that not all managers are the same when run size is held fixed. • Note that is equivalent to because JMP has constraint that manager[a]+manager[b]+manager[c]=0. • Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus alternative hypothesis that Run size coefficient isn’t zero. Same p-value as t-test.

  18. Effect tests shows that managers are not equal. • For the same run size, Carol is best (lowest mean run time), followed by Bob and then Alice. • The above model assumes no interaction between Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.

  19. Testing for Differences Between Specific Managers

  20. Inference for Differences of Coefficients in JMP

More Related