1 / 33

Estimation & Inference for Point Processes

MLE K-function & variants Residual methods Separable estimation Separability tests. Estimation & Inference for Point Processes. Maximum Likelihood Estimation : For a space-time point process N , the log-likelihood function is given by

jarah
Download Presentation

Estimation & Inference for Point Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MLE • K-function & variants • Residual methods • Separable estimation • Separability tests Estimation & Inference for Point Processes

  2. Maximum Likelihood Estimation: For a space-time point process N, the log-likelihood function is given by log L = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx. Why? -------------x-x-----------x-----------x----------x---x--------------------- 0 t1 t2 t3 t4 t5 t6 T Consider the case where N is a Poisson process observed in time only. L = P(points at t1,t2,t3,…, tn, and no others in [0,T]) = P(pt at t1) x P(pt at t2) x … x P(pt at tn) x P{no others in [0,T]} = l(t1) x l(t2) x … x l(tn) x P{no others in [0,t1)} x … x P{no others in [tn,T]} = l(t1) x … x l(tn) x exp{-∫ot1l(u) du} x … x exp{-∫tnTl(u) du} = P l(ti) x exp{-∫oTl(u) du}. So log L = ∑ log l(ti) - ∫oTl(u) du.

  3. log L = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx. • Here l(t,x) is the conditional intensity. The case where the Papangelou intensity lp(t,x) is used instead is called the pseudo-likelihood. • When l depends on parameters q, so does L: • log L(q) = ∫oT∫S logl(t,x; q) dN - ∫oT∫Sl(t,x; q) dt dx. • Maximum Likelihood Estimation (MLE): • Find the value of q that maximizes L(q). • (In practice, by finding the value that minimizes -log L(q).) • Example: stationary Poisson process with rate l(t,x) = m. • log L(m) = ∫oT∫S logl(t,x) dN - ∫oT∫Sl(t,x) dt dx • = n log(m) - mST • d logL(m)/dm = n/m - ST • which = 0 when m = n/ST. S o o T

  4. Under somewhat general conditions, q is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001). • Important counter-examples: • l(a,b) = a + bt, l(a,b) = exp{a + bt}, (for b < 0). • Other problems with MLE: • Bias can be substantial. • e.g. Matern I, q = min{||(xi , yi) - (xj , yj)||}. • Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc. • Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.

  5. K-function & Variations: • Usual K-function, for spatial processes only (Ripley 1978): • Assume the null hypothesis that N is stationary Poisson, with constant rate l. • K(h) = 1/l [Expected # of pts within distance h of a given pt] • Estimated via 1/l [∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n], where l = n/S. • Under the null hypothesis, K(h) = 1/l [lph2] = ph2. • Higher K indicates more clustering; lower K indicates inhibition. • Centered version, L(h) = √[K(h)/p] - h. • L > 0 indicates clustering, L < 0 indicates inhibition. • Version based on nearest-neighbors only (J-function): • J(h) ~ 1/l Pr{nearest neighbor of a given point is within distance h}

  6. K-function & Variations: • Weighted K-function (Baddeley, Møller and Waagepetersen 2002; Veen 2006): • Null hypothesis is a general conditional intensity l(x,y). • Weight each point (xi,yj) by a factor of wi = l(xi,yi)-1. • Estimated K-function is S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2; • Kw(h)^= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, • where wi = l(xi,yi)-1. • Asymptotically normal, under certain regularity conditions (Veen 2006). • Centered version: Lw(h)^= √[Kw(h)^/π] - h, for R2. • Lw(h)^> 0 indicates more weight in clusters within h than expected according to the model for l(x,y). ==> l(x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data. • Lw(h)^< 0 indicates less weight in pairs within h than expected according to the model for l(x,y). ==> l(x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).

  7. These statistics can be used for estimation as well as testing: Given a class of models with parameter q to be estimated, choose the parameter q that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h; q) [Guan 2007]. Similarly for other statistics such as Kw(h)^[Veen 2006].

  8. Model: l(x,y;a) = a m(x,y) + (1- a)n. h (km)

  9. 3) How else can we tell how well a given pp model fits?a) Likelihood statistics (LR, AIC, BIC). [For instance, AIC = -2 logL(q) + 2p. Overly simplistic. Not graphical.b) Other tests TTT, Khamaladze (Andersen et al. 1993) Cramèr-von Mises, K-S test (Heinrich 91) Higher moment and spectral tests (Davies 77) Integrated residual plots (Baddeley et al. 2005):Plot: N(Ai) - C(Ai), over various areas Ai. Useful for the mean, but questionable power. Fine-scale interactions not inspected.d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)

  10. For multi-dimensional point processes: ^ Stretch/compress one dimension according to l, keeping others fixed. ^ Transformed process is Poisson with rate 1 iff. l = l almost everywhere.

  11. Problems with multi-dimensional residual analysis: * Irregular boundary, plotting. * Points in transformed space can be hard to interpret. * For highly clustered processes: boundary effects, loss of power. Possible solutions: truncation, horizontal rescaling. Thinning: Suppose inf l(xi ,yi) = b. Keep each point (xi ,yi) in original dataset with probability b / l(xi ,yi) . Obtain a different residual process, same scale as data. Can repeat many times --> many Poisson processes (but not quite independent!)

  12. Conditional intensity l(t, x1, …, xk; q): [e.g. x1=location, x2 = size.] • Separability for Point Processes: • Say l is multiplicative in mark xj if l(t, x1, …, xk; q) = q0lj(t, xj; qj) l-j(t, x-j; q-j), where x-j = (x1,…,xj-1, xj+1,…,xk), same for q-j and l-j • If l~is multiplicative in xj^ and if one of these holds, then • qj, the partial MLE,= qj, the MLE: • S l-j(t, x-j; q-j) dm-j = g, for all q-j. • S lj(t, xj; qj) dmj = g, for all qj. ^~ • S lj(t, x; q) dm= S lj(t, xj; qj) dmj = g, for all q.

  13. Individual Covariates: • Suppose l is multiplicative, and • lj(t,xj; qj) = f1[X(t,xj); b1] f2[Y(t,xj); b2]. • If H(x,y) = H1(x) H2(y), where for empirical d.f.s H,H1,H2, • and if thelog-likelihood is differentiable w.r.t. b1, • then the partial MLE of b1 = MLE of b1. (Note: not true for additive models!) • Suppose lis multiplicative and the jth component is additive: • lj(t,xj; qj) = f1[X(t,xj); b1] + f2[Y(t,xj); b2]. • If f1 and f2 are continuous and f2 is small: • S f2(Y; b2)2 / f1(X;~b1) dm / T ->p 0], • then the partial MLE b1 is consistent.

  14. Impact • Model building. • Model evaluation / dimension reduction. • Excluded variables.

  15. Model Construction • For example, for Los Angeles County wildfires: • Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date • Tapered Pareto size distribution f, smooth spatial background m. l(t,x,a) = b1exp{b2R(t) + b3W(t) + b4P(t)+ b5A(t;60) + b6T(t) + b7[b8 - D(t)]2} m(x) g(a). Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.

  16. r = 0.16 (sq m)

  17. Testing separability in marked point processes: Construct non-separable and separable kernel estimates of l by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004) May also consider: S5 = mean absolute difference at the observed points. S6 = maximum absolute difference at observed points.

  18. S3 seems to be most powerful for large-scale non-separability:

  19. However, S3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:

  20. For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:

  21. Los Angeles County Wildfire Example:

  22. Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:

  23. Summary: 1) MLE: maximize log L(q) = ∫oT∫S logl(t,x; q) dN - ∫oT∫Sl(t,x; q) dt dx. 2) Estimated K-function: S ∑∑i≠j I(|(xi ,yi) - (xj,yj)| ≤ h) / n2; L(h)^= √[Kw(h)^/π] - h Kw(h)^= S ∑∑i≠j wi wj I(|(xi ,yi) - (xj,yj)| ≤ h) / n2, where wi = l(xi,yi)-1. 3) Residuals: Integrated residuals: [N(Ai) - C(Ai)]. Rescaled residuals [stretch one coordinate according to ∫l(x,y) dm ]. Thinned residuals [keep each pt with prob. b / l(xi ,yi) ]. 4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs. 5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate. Next time: applications to models for earthquakes and wildfires.

More Related