340 likes | 538 Views
MLE K-function & variants Residual methods Separable estimation Separability tests. Estimation & Inference for Point Processes. Maximum Likelihood Estimation : For a space-time point process N , the log-likelihood function is given by
E N D
MLE • K-function & variants • Residual methods • Separable estimation • Separability tests Estimation & Inference for Point Processes
Maximum Likelihood Estimation: For a space-time point process N, the log-likelihood function is given by log L = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx. Why? -------------x-x-----------x-----------x----------x---x--------------------- 0 t1 t2 t3 t4 t5 t6 T Consider the case where N is a Poisson process observed in time only. L = P(points at t1,t2,t3,…, tn, and no others in [0,T]) = P(pt at t1) x P(pt at t2) x … x P(pt at tn) x P{no others in [0,T]} = l(t1) x l(t2) x … x l(tn) x P{no others in [0,t1)} x … x P{no others in [tn,T]} = l(t1) x … x l(tn) x exp{-∫ot1l(u) du} x … x exp{-∫tnTl(u) du} = P l(ti) x exp{-∫oTl(u) du}. So log L = ∑ log l(ti) - ∫oTl(u) du.
log L = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx. • Here l(t,x) is the conditional intensity. The case where the Papangelou intensity lp(t,x) is used instead is called the pseudo-likelihood. • When l depends on parameters q, so does L: • log L(q) = ∫oT∫S logl(t,x; q) dN - ∫oT∫Sl(t,x; q) dt dx. • Maximum Likelihood Estimation (MLE): • Find the value of q that maximizes L(q). • (In practice, by finding the value that minimizes -log L(q).) • Example: stationary Poisson process with rate l(t,x) = m. • log L(m) = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx • = n log(m) - mST • d logL(m)/dm = n/m - ST • which = 0 when m = n/ST. S o o T
Under somewhat general conditions, q is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001). • Important counter-examples: • l(a,b) = a + bt, l(a,b) = exp{a + bt}, (for b < 0). • Other problems with MLE: • Bias can be substantial. • e.g. Matern I, q = min{||(xi , yi) - (xj , yj)||}. • Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc. • Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.
K-function & Variations: • Usual K-function, for spatial processes only (Ripley 1978): • Assume the null hypothesis that N is stationary Poisson, with constant rate l. • K(h) = 1/l [Expected # of pts within distance h of a given pt] • Estimated via 1/l [∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n], where l = n/S. • Under the null hypothesis, K(h) = 1/l [lph2] = ph2. • Higher K indicates more clustering; lower K indicates inhibition. • Centered version, L(h) = √[K(h)/p] - h. • L > 0 indicates clustering, L < 0 indicates inhibition. • Version based on nearest-neighbors only (J-function): • J(h) ~ 1/l Pr{nearest neighbor of a given point is within distance h}
K-function & Variations: • Weighted K-function (Baddeley, Møller and Waagepetersen 2002; Veen 2006): • Null hypothesis is a general conditional intensity l(x,y). • Weight each point (xi,yj) by a factor of wi = l(xi,yi)-1. • Estimated K-function is S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2; • Kw(h)^= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, • where wi = l(xi,yi)-1. • Asymptotically normal, under certain regularity conditions (Veen 2006). • Centered version: Lw(h)^= √[Kw(h)^/π] - h, for R2. • Lw(h)^> 0 indicates more weight in clusters within h than expected according to the model for l(x,y). ==> l(x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data. • Lw(h)^< 0 indicates less weight in pairs within h than expected according to the model for l(x,y). ==> l(x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).
These statistics can be used for estimation as well as testing: Given a class of models with parameter q to be estimated, choose the parameter q that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h; q) [Guan 2007]. Similarly for other statistics such as Kw(h)^[Veen 2006].
3) How else can we tell how well a given pp model fits?a) Likelihood statistics (LR, AIC, BIC). [For instance, AIC = -2 logL(q) + 2p. Overly simplistic. Not graphical.b) Other tests TTT, Khamaladze (Andersen et al. 1993) Cramèr-von Mises, K-S test (Heinrich 91) Higher moment and spectral tests (Davies 77) Integrated residual plots (Baddeley et al. 2005):Plot: N(Ai) - C(Ai), over various areas Ai. Useful for the mean, but questionable power. Fine-scale interactions not inspected.d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)
For multi-dimensional point processes: ^ Stretch/compress one dimension according to l, keeping others fixed. ^ Transformed process is Poisson with rate 1 iff. l = l almost everywhere.
Problems with multi-dimensional residual analysis: * Irregular boundary, plotting. * Points in transformed space can be hard to interpret. * For highly clustered processes: boundary effects, loss of power. Possible solutions: truncation, horizontal rescaling. Thinning: Suppose inf l(xi ,yi) = b. Keep each point (xi ,yi) in original dataset with probability b / l(xi ,yi) . Obtain a different residual process, same scale as data. Can repeat many times --> many Poisson processes (but not quite independent!)
Conditional intensity l(t, x1, …, xk; q): [e.g. x1=location, x2 = size.] • Separability for Point Processes: • Say l is multiplicative in mark xj if l(t, x1, …, xk; q) = q0lj(t, xj; qj) l-j(t, x-j; q-j), where x-j = (x1,…,xj-1, xj+1,…,xk), same for q-j and l-j • If l~is multiplicative in xj^ and if one of these holds, then • qj, the partial MLE,= qj, the MLE: • S l-j(t, x-j; q-j) dm-j = g, for all q-j. • S lj(t, xj; qj) dmj = g, for all qj. ^~ • S lj(t, x; q) dm= S lj(t, xj; qj) dmj = g, for all q.
Individual Covariates: • Suppose l is multiplicative, and • lj(t,xj; qj) = f1[X(t,xj); b1] f2[Y(t,xj); b2]. • If H(x,y) = H1(x) H2(y), where for empirical d.f.s H,H1,H2, • and if thelog-likelihood is differentiable w.r.t. b1, • then the partial MLE of b1 = MLE of b1. (Note: not true for additive models!) • Suppose lis multiplicative and the jth component is additive: • lj(t,xj; qj) = f1[X(t,xj); b1] + f2[Y(t,xj); b2]. • If f1 and f2 are continuous and f2 is small: • S f2(Y; b2)2 / f1(X;~b1) dm / T ->p 0], • then the partial MLE b1 is consistent.
Impact • Model building. • Model evaluation / dimension reduction. • Excluded variables.
Model Construction • For example, for Los Angeles County wildfires: • Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date • Tapered Pareto size distribution f, smooth spatial background m. l(t,x,a) = b1exp{b2R(t) + b3W(t) + b4P(t)+ b5A(t;60) + b6T(t) + b7[b8 - D(t)]2} m(x) g(a). Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.
r = 0.16 (sq m)
Testing separability in marked point processes: Construct non-separable and separable kernel estimates of l by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004) May also consider: S5 = mean absolute difference at the observed points. S6 = maximum absolute difference at observed points.
S3 seems to be most powerful for large-scale non-separability:
However, S3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:
For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:
Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:
Summary: 1) MLE: maximize log L(q) = ∫oT∫S logl(t,x; q) dN - ∫oT∫Sl(t,x; q) dt dx. 2) Estimated K-function: S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2; L(h)^= √[Kw(h)^/π] - h Kw(h)^= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, where wi = l(xi,yi)-1. 3) Residuals: Integrated residuals: [N(Ai) - C(Ai)]. Rescaled residuals [stretch one coordinate according to ∫l(x,y) dm ]. Thinned residuals [keep each pt with prob. b / l(xi ,yi) ]. 4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs. 5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate. Next time: applications to models for earthquakes and wildfires.