1 / 42

PhD Methods Duration Models December 1 & 15, 2008

PhD Methods Duration Models December 1 & 15, 2008. Sessions & Inter-session. Session 1: Objectives & Tools Questions → Models Models → Analyses Offline assignments Session 2: Analyses → Presentations Presentations → Questions. Objectives and Tools. Objectives.

adeola
Download Presentation

PhD Methods Duration Models December 1 & 15, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhD Methods Duration Models December 1 & 15, 2008

  2. Sessions & Inter-session Session 1: Objectives & Tools Questions → Models Models → Analyses Offline assignments Session 2: Analyses → Presentations Presentations → Questions

  3. Objectivesand Tools

  4. Objectives • Understand duration models • What questions they help answer? • What ‘flavors’ exist? • How they work? • What you need to check? • Get more practice with STATA • Watch me do tricks… • …and do your own with assignments!

  5. Applied researcher’s toolkit • Reference texts • Woolridge’s “Econometric analysis of cross sectional and panel data” • Greene’s “Econometric analysis” • STATA Corp’s “Survival analysis and epidemiology tables” • Rabe-Hesketh & Everitt’s “A handbook of statistical analyses using STATA” • STATA v10

  6. Questionsto Models

  7. Typical questions? • Asked in life, engineering, economic and administrative sciences • Interested in the length of a spell of time: • What are predictors of this duration? • Does duration depend on elapsed time?

  8. Key terms? • Spell (origin, failure, duration) • Hazard • At-risk • Risk set • Censoring (left, right)

  9. A little theory • Spell length: T ~ f(t), f ‘nice’ • Cumulative probability: F(t) • Survival function: S(t) = 1-F(t) = Pr{T t} • Hazard rate: (t) = limt →0 Pr{T [t, t+t]|T t} / t = limt →0 {F( t+t) – F(t)} / t / S(t) = f(t) / S(t) = -d ln S(t)/dt

  10. Hazard models • Non-parametric (e.g. Kaplan-Meier) • Semi-parametric (e.g. Cox) • Fully parametric (e.g. Weibull)

  11. Kaplan-Meier non-parametric • Let’s look at a purely empirical estimate of the survivor function • Suppose nj is the number of units at risk before dj failures occur at tj • Then estimated Ŝ(t) = j|tjt ((nj – dj)/nj)

  12. Kaplan-Meier implementation • KM curves are easily plotted in Stata: use http://www.statapress.com/data/r10/drugtr list stset sts graph sts graph, by(drug) ci level(95) sts graph, by(drug) sts test drug sts test drug, wilcoxon gen ageless57 =(age < 57) sts test ageless57

  13. Cox’s semi-parametric model • Specification (ti) = exp (xi ) 0(ti) • Problem: estimate  in presence of the unknown individual heterogeneity 0(ti)? • Solution: condition on exactly 1 individual leaving risk set at time of interest

  14. Cox’s model • Let Tk be the kth exit time, and let Rk be the at-risk set. • Then Pr{ti=Tk|Rk}=exp(xi )/j Rkexp(xj ) sweeping out the 0(ti) terms • Maximize this partial likelihood function ln L = k [xk  – ln (j Rkexp(xj ))]

  15. Cox’s model if tied exit times • The partial likelihood function must now account for non-unique exit times. • Suppose there is a set Dj of failure times at time tj, and dj is the cardinality of that set, where Rj is at-risk set of units at tj ln L=j  D[k Dj xk  – djln(i Rkexp(xj ))]

  16. Let’s try Cox’s model • STATA implements Cox’s model very clearly use http://www.statapress.com/data/r10/kva stset stcox load estimates store load stcox load bearings lrtest . load drop bearings stcox load %look at the estimated coefficient on load%

  17. What’s Stata doing with Cox? • Look at the Excel spreadsheet in http://faculty.fuqua.duke.edu/~willm/Classes/PhD/PhD_2008_2009_LongStrat/Strategy591_2008_2009_ResearchMethods.htm • I’ve tried to show in easy sequences how the ado file in Stata parallels the partial likelihood function we just learned. • Note the log-like and estimate of  this spreadsheet yields. Compare with Stata

  18. Key Cox assumptions • Recall the Cox specification for the hazard rate for individual i at time tk i(tk) = exp (xi ) 0(tk) • Consider the hazard ratio for two individuals i and m, again at time tk i(tk)/m(tk) = exp(xi)0(tk)/exp(xm)0(tk) = exp ((xi – xm)) ~ some proportionality constant

  19. Testing Cox’s assumptions • Is global proportionality reasonable? use http://www.statapress.com/data/r10/drugtr gen ageless57 =(age < 57) sts graph, by(ageless57) %% curves roughly parallel?% stcox drug stcox drug, strata(ageless57) stphplot, by(ageless57) %% curves roughly parallel?% stcoxkm,by(ageless57) %% predicted vs observed?% • Mitigation with stratification

  20. Testing Cox model residuals • Are there significant outliers? use http://www.statapress.com/data/r10/kva stcox load bearings, mgale (mart) predict devr, deviance predict xb, xb twoway scatter devr xb %% residuals look reasonable?% stcox load bearings, esr(score*) twoway scatter score1 failtime %% large deviations?% twoway scatter score2 failtime %% large deviations?%

  21. Fully parametric • So far the underlying baseline hazard rate has been left unspecified. We can modify this assumption using parametric models. • Easiest choice is exponential survival function in which hazard rate is constant -d ln S(t)/dt = (t)  ⇒ S(t) = exp (-t)

  22. Other fully parametric models • Weibull specification of a monotonic hazard rate with p > 0 (t)  p(t)p-1 use http://www.statapress.com/data/r10/kva streg load bearings, d(weibull) stcurve, haz streg load bearings, d(exponential) stcurve, haz sts, haz

  23. Models to Analyses

  24. Practice (1): Equine risks • stset the data • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

  25. Practice (2): Military risks • stset the data • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

  26. Practice (3): Hospital stay risks • stset the data Warning: this is a huge set • Basic non-parametric exploration • More parametric models • Model assumptions tested • Constructing additional variables as needed

  27. Offline assignments

  28. Assignments • Data assignment • Reading assignment

  29. Assignments: data • Datasets from military, veterinary and medical science • Data may be fictional, is certainly de-identified, and should not be re-used • Think of a simple, plausible research question, model it, analyze one set of data, write up and present results (1-2 p)

  30. Assignments: reading • Read and briefly critique each of: • Jensen, M. 2006. Should we stay or should we go? Accountability, status anxiety, and client defections. ASQ51: 97-128 • Rao H, Greve HR, Davis GF. 2001. Fool's gold: social proof in the initiation and abandonment of coverage by Wall Street analysts. ASQ46(3): 502-526 • My 2008 working paper on cardiologists

  31. Assignments: reading… • Typical questions we’ll discuss • Are the research question, data and the model choice congruent? • How else could they have answered the question • Different data? • Different model? • Different analysis? • Is the presentation of the analyses clear and compelling? • Do you buy it? Why or why not? • What is left to do?

  32. Assignments… • Reading and datasets posted at http://faculty.fuqua.duke.edu/~willm/Classes/PhD/PhD_2008_2009_LongStrat/Strategy591_2008_2009_ResearchMethods.htm • Email your write-up by next Friday, Dec 12, by close-of-business • Be prepared to discuss reading and answer questions on Monday, Dec 15

  33. Analyses to Presentation

  34. Equine data • What predicts fatal injury hazard here? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

  35. Discharge data • What predicts the discharge hazard? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

  36. Military data • What predicts the fatal wound hazard? • Does that make sense? • What model did you use and why? • How did you check it? • What summary results do you have? • What’s missing in the data? • What’s wrong with our model?

  37. Presentation to Questions

  38. Some recent ‘presentations’ • Jensen, M. 2006. Should we stay or should we go? Accountability, status anxiety, and client defections. Administrative Science Quarterly51: 97-128 • Rao H, Greve HR, Davis GF. 2001. Fool's gold: social proof in the initiation and abandonment of coverage by Wall Street analysts. Administrative Science Quarterly46(3): 502-526 • My working paper on cardiologists

  39. Loose Ends & The End

  40. We (probably) didn’t cover… • When covariates vary over time? • What to do about a lot of left censoring? • Frailty models for omitted variables • Shared frailty models to explain similarity in duration in groups of units  Stata manual and experimentation are almost always the best next steps

  41. Summary • Neat modeling tools exist when you have data on timings and care about differences in timings and their reason • Really neat when you care about firm longevity, leadership durations, spells of some management activity

  42. Thank you!

More Related