Sufficient Statistics in Statistical Modeling

More Chapter 4 Sufficient statistics. The Poisson and the exponential can be summarized by (n, ). So too can the normal with known variance Consider a statistic S(Y) Suppose that the conditional distribution of Y given S does not depend on , then S is a sufficient statistic for  based on Y Occurs iff the density of Y factors into a function of s(y) and  and a function of y that doesn't depend on 

Example. Exponential IExp() ~ Y E(Y) =  Var(Y) = 2 Data y1,...,yn L() =  -1 exp(-yj /) l() = -nlog() - yj / yj /n is sufficient

maximum

Approximate 100(1-2 )% CI for 0 Example. spring data

Weibull.

Note. Expected information

Gamma.

Example. Bernoulli Pr{Y = 1} = 1 - Pr{Y = 0} =  0   1 L() =  ^yi (1 - )^(1-yi) = r(1 - )n-r l() = rlog() + (n-r)log(1-) r =  yj R =  Yj is sufficient for , as is R/n L() factors into a function of r and a constant

Score vector  [ yj / - (n-yj )/(1-)] Observed information  [yj /2 + (n-yj )/(1-)2 ] M.l.e.

Cauchy. ICau() f(y;) = 1/(1+(y-)2 ) E|Y| =  Var(Y) =  L() =  1/((1+(yj -)2 ) Many local maxima l() = -log(1+(yj -)2 ) J() = 2((1-(yj -)2 )/(1+(yj -)2 )2 I() = n/2

Uniform. f(u;) = 1/ 0 < u <  = 0 otherwise L() = 1/n 0 < y1 ,..., yn <  = 0 otherwise

l() becomes increasingly spikey E u() = -1 i() = -

Logistic regression. Challenger data Ibinomials Rj , mj , j

Likelihood ratio. Model includes  dim() = p true (unknown) value 0 Likelihood ratio statistic

Justification. Multinormal result If Y ~ N (,) then (Y- )T -1(Y- ) ~ p2

Uses. Pr[W(0)  cp(1-2 )]  1-2 Approx 100(1-2 )% confidence region

Example. exponential Spring data: 96 <  <335 vs. asymp normal approx 64 <  <273 kcycles

Prob-value/P-value. See (7.28) Choose T whose large values cast doubt on H0 Pr0(T  tobs) Example. Spring data Exponential E(Y) =  H0:  = 100?

Nesting : p by 1 parameter of interest : q by 1 nuisance parameter Model with params (0, ) nested within(, ) Second model reduces to first when  = 0

Example. Weibull params (,) exponential when  = 1       How to examine H0 :  = 1?

Spring failure times. Weibull

Challenger data. Logistic regression temperature x1 pressure x2 (0 , 1 , 2 ) = exp{}/(1+exp{})  = 0 + 1 x1 + 2 x2 linear predictor loglike l(0 , 1 , 2 ) = 0  rj + 1  rj x1j + 2  rj x2j - m  log(1+exp{j }) Does pressure matter?

Model fit. Are labor times Weibull? Nest its model in a more general one Generalized gamma. Gamma for =1 Weibull for =1 Exponential for ==1

Likelihood results. max log likelihood: generalized gamma -250.65 gamma -251.12 Weibull -251.17 gamma vs. generalized gamma - 2 log like diff: 2(-250.65+251.12) = .94 P-value Pr0 (12 > .94) = Pr(|Z|>.969) = 2(.166) = .332

Chi-squared statistics. Pearson's chi-squared categories 1,...,k count of cases in category i: Yi Pr(case in i) = i 0 < i < 1 1ki =1 E(Yi ) = ni var(Yi ) = i(1 - i )n cov(Yi ,Yj ) = -ij n i j E.g. k=2 case cov(Y,n-Y) = -var(Y) = -n12  = { (1 ,...,k ): 1ki = 1, 0<1 ,...,k <1} dimension k-1

Reduced dimension possible? model i () dim() = p log like general model: 1k-1 yi log i + yk log[1-1 -...-k-1], 1k yi = n log like restricted model: l() = 1k-1 yi log i() + yk log[1-1()-...-k-1()]

likelihood ratio statistic: if restricted model true The statistic is sometimes written W = 2  Oi log(Oi /Ei )  (Oi - Ei )2/Ei

Pearson's chi-squared.

Example. Birth data. Poisson? Split into k=13 categories [0,7.5), [7.5,8.5),...[18.5,24] hours O(bserved) 6 3 3 8 ... E(xpected) 5.23 4.37 6.26 8.08 ... P = 4.39 P-value Pr(112 > 4.39) = .96

Two way contingency table. r rows and c columns n individuals Blood groups A, B, AB, O A, B antigens - substance causing body to produce antibodies group count model I model II O = 1 - A - B

Question. Rows and columns independent? W = 2  yij log nyij / yi.y.j with yi. = j yij ~ k-1-p2 = (r-1)c-1)2 with k=rc p=(r-1)+(c-1) P =  (yij - yi. y.j /n)2 / (yi. y.j /n) ~ (r-1)(c-1)2

Model 1 W = 17.66 Pr(12> 17.66) = Pr(|Z| > 4.202) = 2.646E-05 P = 15.73 Pr(12> 15.73) = Pr(|Z| > 3.966) = 7.309E-05 k-1-p = 4-1-2 = 1 Model 2 W = 3.17 Pr(|Z| > 1.780) = .075 P = 2.82 Pr(|Z|>1.679) = .093

Incorrect model. True model g(y), fit f(y;)

Example 1. Quadratic, fit linear

Example 2. True lognormal, but fit exponential

Large sample distribution.

Model selection. Various models: non-nested Ockham's razor. Prefer the simplest model

Formal criteria. Look for minimum

Example. Spring failure Model p AIC BIC M1 12 744.8* 769.9* M2 7 771.8 786.5 M3 2 827.8 831.2 M4 2 925.1 929.3 6 stress levels M1: Weibull - unconnected ,  at each stress level

Sufficient Statistics in Statistical Modeling