Theory of Statistics Course Details - Statistical Inference Concepts, Exams, and Assignments

732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011

Course details • Course web: www.ida.liu.se/~732A36 • Course responsible, tutor and examiner: Anders Nordgaard • Course period: Nov 2011-Jan 2012 • Examination: Written exam in January 2012, Compulsory assignments • Course literature: “Garthwaite PH, Jolliffe IT and Jones B (2002). Statistical Inference. 2nd ed. Oxford University Press, Oxford. ISBN 0-19-857226-3” Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course contents • Statistical inference in general • Point estimation (unbiasedness, consistency, efficiency, sufficiency, completeness) • Information and likelihood concepts • Maximum-likelihood and Method-of-moment estimation • Classical hypothesis testing (Power functions, the Neyman-Pearson lemma , Maximum Likelihood Ratio Tests, Wald’s test) • Confidence intervals • … Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course contents, cont. • Statistical decision theory (Loss functions, Risk concepts, Prior distributions, Sequential tests) • Bayesian inference (Estimation, Hypothesis testing, Credible intervals, Predictive distributions) • Non-parametric inference • Computer intensive methods for estimation Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Details about teaching and examination • Teaching is (as usual) sparse: A mixture between lectures and problem seminars • Lectures: Overview and some details of each chapter covered. No full-cover of the contents! • Problem seminars: Discussions about solutions to recommended exercises. Students should be prepared to provide solutions on the board! • Towards the end of the course a couple of larger compulsory assignments (that need solutions to be worked out with the help of a computer) will be distributed. • The course is finished by a written exam Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Prerequisities • Good understanding of calculus an algebra • Good understanding of the concepts of expectations (including variance calculations) • Familiarity with families of probability distributions (Normal, Exponential, Binomial, Poisson, Gamma (Chi-square), Beta, …) • Skills in computer programming (e.g. with R , SAS, Matlab,) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Statistical inference in general Population Model Sample Conclusions about the population is drawn from the sample with assistance from a specified model Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The two paradigms: Neyman-Pearson (frequentistic) and Bayesian Population Model • Neyman-Pearson: • Model specifies the probability distribution for data obtained in a sample including a number of unknown population parameters • Bayesian: • Model specifies the probability distribution for data obtained in a sample and a probability distribution (prior) for each of the unknown population parameters of that distribution Sample Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

How is inference made? • Point estimation: Find the “best” approximations of an unknown population parameter • Interval estimation: Find a range of values that with high certainty covers the unknown population parameter • Can be extended to regions if the parameter is multidimensional • Hypothesis testing: Give statements about the population (values of parameters, probability distributions, issues of independence,…) along with a quantitative measure of “certainty” Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Tools for making inference • Criteria for a point estimate to be “good” • “Algorithmic” methods to find point estimates (Maximum Likelihood, Least Squares, Method-of-Moments) • Classical methods of constructing hypothesis test (Neyman-Pearson lemma, Maximum Likelihood Ratio Test,…) • Classical methods to construct confidence intervals (regions) • Decision theory (make use of loss and risk functions, utility and cost) to find point estimates and hypothesis tests • Using prior distributions to construct tests , credible intervals and predictive distributions (Bayesian inference) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Tools for making inference… • Using theory of randomization to form non-parametric tests (tests not depending on any probability distribution behind data) • Computer intensive methods (bootstrap and cross-validation techniques) • Advanced models from data that make use of auxiliary information (explanatory variables): Generalized linear models, Generalized additive models, Spatio-temporal models, … Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The univariate population-sample model • The population to be investigated is such that the values that comes out in a sample x1, x2 , …are governed by a probability distribution • The probability distribution is represented by a probability density (or mass) function f(x ) • Alternatively, the sample values can be seen as the outcomes of independent random variables X1, X2, … all with probability density (or mass) function f(x ) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Point estimation (frequentistic paradigm) • We have a sample x = (x1 , … , xn ) from a population • The population contains an unknown parameter  • The functional forms of the distributional functions may be known or unknown, but they depend on the unknown  . • Denote generally by f(x ;  ) the probability density or mass function of the distribution • A point estimate of  is a function of the sample values such that its values should be close to the unknown . Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

“Standard” point estimates • The sample mean is a point estimate of the population mean  • The sample variance s2 is a point estimate of the population variance  2 • The sample proportion p of a specific event (a specific value or range of values) is a point estimate of the corresponding population proportion  Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Assessing a point estimate • A point estimate has a sampling distribution • Replace the sample observations x1 , … , xn with their corresponding random variables X1 , … , Xn in the functional expression: •  The point estimate is a random variable that is observed in the sample (point estimator) • As a random variable the point estimator must have a probability distribution than can be deduced from f (x ;  ) • The point estimator /estimate is assessed by investigating the its sampling distribution, in particular the mean and the variance. Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Unbiasedness • A point estimator is unbiased for  if the mean of its sampling distribution is equal to  • The bias of a point estimate for  is • Thus, a point estimate with bias = 0 is unbiased, otherwise it is biased Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Examples (within the univariate population-sample model) • The sample mean is always unbiased for estimating the population mean • Is the sample mean an unbiased estimate of the population median? • Why do we divide by n–1 in the sample variance (and not by n )? Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Consistency • A point estimator is (weakly) consistent if • Thus, the point estimator should converge in probability to  • Theorem: A point estimator is consistent if • Proof: Use Chebyshev’s inequality in terms of Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Examples • The sample mean is a consistent estimator of the population mean. What probability law can be applied? • What do we require for the sample variance to be a consistent estimator of the population variance? Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Efficiency • Assume we have two unbiased estimators of  , i.e. • The efficiency of an unbiased estimator is defined as Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example • Let Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Likelihood function • For a sample x • the likelihood function for is defined as • the log-likelihood function is measure how likely (or expected) the sample is Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Fisher information • The (Fisher) Information about  contained in a sample x is defined as • Theorem: Under some regularity conditions (interchangeability of integration and differentiation) In particular the range of X cannot depend on  (such as in a population where X  U(0, ) ) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Why is it measure of information for  Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example • X Exp( ) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Cramér-Rao inequality • Under the same regularity conditions as for the previous theorem the following holds for any unbiased estimator • The lower bound is attained if and only if Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Proof: Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example • X Exp( ) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Sufficiency • A function T of the sample values of a sample x, i.e. T = T(x)=T(x1 , … , xn ) is a statistic that is sufficient for the parameter  if the conditional distribution of the sample random variables does not depend on , i.e. • What does it mean in practice? • If T is sufficient for  then no more information about  than what is contained in T can be obtained from the sample. • It is enough to work with T when deriving point estimates of  Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The factorization theorem: T is sufficient for  if and only if the likelihood function can be written i.e. can be factorized using two non-negative functions such that the first depends on x only through the statistics T and also on  and the second does not depend on  Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example, cont • X Exp( ) Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Theory of Statistics Course Details - Statistical Inference Concepts, Exams, and Assignments

Theory of Statistics Course Details - Statistical Inference Concepts, Exams, and Assignments

Presentation Transcript

Statistics of Anatomic Geometry: Information Theory and Automatic Model Building

Privacy-Protecting Statistics Computation: Theory and Practice

Theory of Differentiation in Statistics

Statistics of Illumination

Chapter 2: Basics from Probability Theory and Statistics

A Theory of Theory Formation

Chapter 2 Probability, Statistics and Flow Theory

Theory of _________________

Review of Statistics

Thoughts on the theory of statistics

Introduction to probability theory and statistics

BANISHING THE THEORY-APPLICATIONS DICHOTOMY FROM STATISTICS EDUCATION

Communication of Statistics

Chapter II: Basics from probability theory and statistics

STATISTICS OF POLAND

Chapter 2: Basics from Probability Theory and Statistics

Basics of Statistics

Statistics of Anatomic Geometry: Information Theory and Automatic Model Building

Uses of Statistics

Principles of Statistics

Overview of Statistics