300 likes | 318 Views
STATS 730: Lecture 5. Sufficiency!. Today’s lecture:. Theme for the next few lectures. The estimation problem: We sample data from a population. Data has joint density depending on a parameter q that has a real-world interpretation. The Estimation Problem:.
E N D
STATS 730: Lecture 5 Sufficiency! Today’s lecture: 730 Lectures 5&6
Theme for the next few lectures • The estimation problem: • We sample data from a population. • Data has joint density depending on a parameter q that has a real-world interpretation 730 Lectures 5&6
The Estimation Problem: • Given a sample X1,…,Xn with joint density f(x1,…xn;q) how should we combine the data to estimate q? ie what statistic S(X) should we use as an estimate? • What considerations are important? • Aside: Important special case: if the X’s are iid, then 730 Lectures 5&6
Important considerations • Small bias (mean of sampling distribution close to q) • Small standard error (std dev of sampling distribution small) • Estimate should use “all the information in the data” -Sufficiency 730 Lectures 5&6
Today’s lecture Sufficiency • The Concept • The Definition • The factorisation theorem • Examples 730 Lectures 5&6
Sufficiency: the concept • Suppose X1,…,Xn have joint density f(x, q) where the value of q is unknown. • We have a statistic S (ie a function of the sample) • How much information about q is contained in the statistic S? 730 Lectures 5&6
Sufficiency – the concept • Suppose the sample is normal with unknown mean m and known variance 1. • How much information about m is contained in the sample mean? • How about the sample variance? Intuitively, the sample mean has more information. 730 Lectures 5&6
Sufficiency – the concept • Suppose the population is Poisson, with unknown mean q. Then the population variance is also q. • Consider two statisticians, A and B who want to estimate q. A gets to look at the whole sample, while B only gets to see the sample mean. Is A better off than B? 730 Lectures 5&6
Sufficiency – the concept • If A gets to look at the whole sample, and B only gets to see the sample variance, is A better off than B? 730 Lectures 5&6
Sufficiency: the concept • Sticking with the Poisson example, suppose A gets to see 100 random numbers. Clearly A hasn’t got any information about q. Why? • Because the distribution of the 100 random numbers is uniform[0,1], which does not depend on q. 730 Lectures 5&6
Sufficiency: the concept • Still sticking with the Poisson example, suppose A gets to see the mean of the Poisson sample. Then, later, A gets to see the whole sample. What does A get to see the second time? • Answer: an observation from the conditional distribution of the sample, given the mean 730 Lectures 5&6
Sufficiency: the concept • If this conditional distribution has q as a parameter, then A has gained some information. • If the conditional distribution does not involve q, (ie q is not a parameter) then A gets no further information by observing the whole sample. 730 Lectures 5&6
Sufficiency: the definition A statistic S is sufficient for a parameter q if the conditional distribution of X1,…,Xn given S does not involve q. 730 Lectures 5&6
Example 1 The Poisson distribution-again The mean is sufficient, because... The conditional distribution of X1,..,Xn given the sample mean does not involve q. To show this… 730 Lectures 5&6
Example 1 (cont) which does not involve q!!!! 730 Lectures 5&6
Example 2 Consider X1,…,Xn having independent binary distributions P(Xi=1)=q, P(Xi=0)=1-q SiXi is sufficient for q: 730 Lectures 5&6
Example 2 (cont) 730 Lectures 5&6
Factorisation theorem • Conditional distributions can be tricky – is there an easier way? • Yes: use the factorisation theorem 730 Lectures 5&6
Factorisation theorem S is sufficient for qif and only if the joint density f(x;q) of the observations can be written as A(S(x);q)B(x) where B does not depend on q This is the “factorisation”!! 730 Lectures 5&6
Factorisation theorem (example) For example, the joint density of the Poisson is 730 Lectures 5&6
Factorisation theorem: proof of continuous version • The hard bit! • The conditional distribution of X1,…,Xn given a statistic S(X) is hard to compute. • So….. 730 Lectures 5&6
Factorisation theorem: proof ofcontinuous version • Suppose we can find a 1-to-1 function g that maps X1,…,Xn onto Y1,…,Yn, such that Y1=S(X). • Y and X contain the same information about q, since if we know X we know Y and vice versa. • Thus, S is sufficient for q iff the conditional distribution of Y2,…Yn given Y1 does not involve q. 730 Lectures 5&6
Factorisation theorem: continuous version Notation: • y=g(x) • x=h(y) (ie h is inverse of g) • S(h(y))=y1 730 Lectures 5&6
No q!! Factorisation theorem: proof ofcontinuous version • Now we prove the theorem. • Suppose the conditional distribution of Y2,…,Yn given Y1 does not involve q. We will show the factorisation holds. 730 Lectures 5&6
Factorisation theorem: proof of continuous version Now conversely suppose the factorisation is true. We show the conditional distribution of Y2,…,Yn given Y1 does not involve q. Step 1: by the change of variable formula,the joint density of Y is 730 Lectures 5&6
Factorisation theorem: proof ofcontinuous version Step 2: the marginal density of Y1 is 730 Lectures 5&6
No q!!! Factorisation theorem: proof of continuous version Step 3: the conditional distribution is 730 Lectures 5&6
No q!! Factorisation theorem: example Normal N(q,1). Mean is sufficient. Joint density is 730 Lectures 5&6
Factorisation theorem: example Any iid sample. Order statistics are sufficient. Joint density is Function of the x(i) alone! 730 Lectures 5&6
Factorisation theorem: example An iid exponential sample. Mean are sufficient. Joint density is 730 Lectures 5&6