250 likes | 488 Views
Event History Analysis. PS 791 Advanced Topics in Data Analysis. Event History Analysis … and its cousins. Event History Analysis is a general term comprising a set of time duration models Survival Analysis Duration analysis Hazard Modeling. Event Duration.
E N D
Event History Analysis PS 791 Advanced Topics in Data Analysis
Event History Analysis … and its cousins • Event History Analysis is a general term comprising a set of time duration models • Survival Analysis • Duration analysis • Hazard Modeling
Event Duration • When we look at processes that occur over time, we are often interested in two aspects of the process: • the duration of the events, • How long a regime or alliance lasts • the transition event or state • The occurrence of a coup
Survival in broader terms • Survival analysis is often used to examine the length of time that an entity survives after exposure to a disease or toxin. • In toxicity studies this time might be the LC50 • The concentration of the toxin that will kill 50% of the species during the time of exposure – say 24 hours • Used for determining acute toxicity of a chemical compound
Survival in a non-fatal sense • Other senses of survival • Length of time a regime lasts or stays in power • Length of a military intervention • Duration of wars; or alliances
The Mathematics of Survival • Some definitions: • T is a positive random variable for survival time – the length of time before a change of state • T is continuous • Until we assume it isn’t – for later • The actual measure of the survival time, or instance of it, is t. • The possible values of T have a probability distribution, f(t), and a cumulative distribution function F(t).
The distribution function of T • The distribution function of T is expressed as: • This expresses the idea that some survival time T is less than or equal to t
The Unconditional Failure Rate • If we differentiate F(t), we get the density function • We can characterize the distribution of failures by either distribution or density function
The Survivor Function • The survivor function denotes the probability a survival time T is equal to or greater that some time T. • This is also the proportion of units surviving beyond t. • S(t) is a strictly decreasing function since as time passes there are fewer and fewer individuals surviving
The Hazard Rate • Given the survival function and the density of failures, we have a way that “survival” and “death are accounted for in EHA (Event History Analysis) • We obtain another important component in EHA when we look at the relationship between the two in the hazard rate.
A Conditional Failure Rate • The hazard rate is the rate at which units fail - or durations end – by t given that the unit has survived until t. • Thus the hazard rate is a conditional failure rate.
The Interrelationships • The hazard rate, survivor function, and distribution and density functions all interrelated. • Thus the hazard rate can be represented by
Using OLS on Durations • If we model the duration of an event using OLS • Like the year a regime lasts • We regress the duration length on a set of characteristics or exogenous variables • Often we will log the duration time because of some extremely durable cases that make the distribution asymmetric. • This will cause problems
Censoring • In some cases, a case may not have failed by the end of the observation period. • We refer to this as right-censoring. • Model adoption of state lottery • If a state has not adopted it by the end of the sample time frame, it is right censored
Left-censoring • Left censoring occurs when the history of the event begins prior to the start of the observed period • A regime that began before the time frame • A dispute already underway
Censoring (cont) • Note that both right- and left-censoring is common in many time-series data sets and is not dealt with in regression designs at all. • EHA can incorporate censoring in the models. • Based on calculating likelihoods
Selection Bias • Duration Models can give us a tool to look at Selection Bias • When we study something like the determinants of regime failure, and we have a data set comprised of regimes, their failure dates, and the exogenous variables we think led to the failure, we have omitted cases that didn’t fail • Because they did not fail because of the same factors that those that did fail we have biased our sample. • Duration models can account for this bias. • Somehow!
Time Varying Covariates • Regression assumes constant relationships (covariates) • What if the slope changes over the course of the study? • Regression can handle this through Stochastic or Time-Varying Parameter models, but they are usually ignored
Distribution of failure times • If we can correctly specify the type and shape of the distribution of the failure rate, we can estimate the impact of the covariates on the failure rate. • The shape of that failure rate is a function of it’s parameterization • The model’s covariates are used to assess that parameterization
The exponential model • The exponential model implies a baseline hazard rate that is flat • The likelihood of a failure is the same at any given time • This implies a constant hazard rate
Other distributions • Weibell • Used if the hazard rate is increasing or decreasing • Log-logistic or Log-normal • Gompertz • How to choose? • Theory? • Generalized Gamma
Proportional Hazard Models • Cox Proportional Hazard • Similar to Weibull
An example • Events • Action-reaction Models