1 / 62

Statistical Decision Making

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision . That is given some data observed from some phenomena, a decision will have to be made about the phenomena. Decisions are generally broken into two types:.

rinah-hicks
Download Presentation

Statistical Decision Making

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Decision Making

  2. Almost all problems in statistics can be formulated as a problem of making a decision . • That is given some data observed from some phenomena, a decision will have to be made about the phenomena

  3. Decisions are generally broken into two types: • Estimation decisions and • Hypothesis Testing decisions.

  4. Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions

  5. Definition: A random variable X is a numerical quantity that is determined by the outcome of a random experiment

  6. Example: An individual is selected at random from a population and X = the weight of the individual

  7. The probability distribution of a random variable (continuous) is describe by: its probability density curve f(x).

  8. i.e. a curve which has the following properties : • 1.      f(x) is always positive. • 2.      The total are under the curve f(x) is one. • 3.      The area under the curve f(x) between a and b is the probability that X lies between the two values.

  9. Examples of some important Univariate distributions

  10. Normal distribution with m = 50 and s =15 Normal distribution with m = 70 and s =20 1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment:If m = 0 and s = 1 the distribution is called the standard normal distribution

  11. 2.The Chi-squared distribution with n degrees of freedom

  12. Comment: If z1, z2, ..., zn are independent random variables each having a standard normal distribution then U = has a chi-squared distribution with n degrees of freedom.

  13. 3. The F distribution withn1 degrees of freedom in the numerator and n2 degrees of freedom in the denominator if x  0 where K =

  14. Comment: If U1 and U2 are independent random variables each having Chi-squared distribution with n1 and n2 degrees of freedom respectively then F = has a F distribution with n1 degrees of freedom in the numerator and n2 degrees of freedom in the denominator

  15. 4.The t distribution with n degrees of freedom where K =

  16. Comment: If zand U are independent random variables, and z has a standard Normal distribution while U has a Chi-squared distribution with n degrees of freedom then t = has a t distribution with n degrees of freedom.

  17. The Sampling distribution of a statistic

  18. A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x1, x2, ...,xn with a probability distribution described by f(x).

  19. If for example we collect a random sample of individuals from a population and • measure some variable X for each of those individuals, • the n measurements x1, x2, ...,xn will form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.

  20. A statistic T is any quantity computed from the random observations x1, x2, ...,xn.

  21. Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function fT(t). • This distribution is called the sampling distribution of the statistic T.

  22. This distribution is very important if one is using this statistic in a statistical analysis. • It is used to assess the accuracy of a statistic if it is used as an estimator. • It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.

  23. Some examples of Sampling distributions of statistics

  24. Distribution of the sample mean for a sample from a Normal popululation Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let

  25. Than has a normal sampling distribution with mean and standard deviation

  26. Distribution of the z statistic Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let Then z has a standard normal distibution

  27. Comment: Many statistics T have a normal distribution with mean mT and standard deviation sT. Then will have a standard normal distribution.

  28. Distribution of the c2 statistic for sample variance Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let = sample variance and = sample standard deviation

  29. Let Then c2 has chi-squared distribution with n = n-1 degrees of freedom.

  30. The chi-squared distribution

  31. Distribution of the t statistic Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let then t has student’s t distribution with n = n-1 degrees of freedom

  32. Comment: If an estimator T has a normal distribution with mean mT and standard deviation sT. If sT is an estimatior of sT based on n degrees of freedom Then will have student’s t distribution with n degrees of freedom.

  33. t distribution standard normal distribution

  34. Point estimation • A statistic T is called an estimator of the parameter q if its value is used as an estimate of the parameter q. • The performance of an estimator T will be determined by how “close” the sampling distribution of T is to the parameter, q, being estimated.

  35. An estimator T is called an unbiased estimator of q if mT, the mean of the sampling distribution of T satisfies mT = q. • This implies that in the long run the average value of T is q.

  36. An estimator T is called the Minimum Variance Unbiased estimator of q if T is an unbiased estimator and it has the smallest standard error sT amongst all unbiased estimators of q. • If the sampling distribution of T is normal, the standard error of T is extremely important. It completely describes the variability of the estimator T.

  37. Interval Estimation • Point estimators give only single values as an estimate. There is no indication of the accuracy of the estimate. • The accuracy can sometimes be measured and shown by displaying the standard error of the estimate.

  38. There is however a better way. • Using the idea of confidence interval estimates • The unknown parameter is estimated with a range of values that have a given probability of capturing the parameter being estimated.

  39. The interval TL to TU is called a (1 - a)  100 % confidence interval for the parameter q, if the probability that q lies in the range TL to TU is equal to 1 - a. • Here are statistics random numerical quantities calculated from the data.

  40. Examples Confidence interval for the mean of a Normal population (based on the z statistic). is a (1 - a)  100 % confidence interval for m, the mean of a normal population. Here za/2 is the upper a/2  100 % percentage point of the standard normal distribution.

  41. More generally if T is an unbiased estimator of the parameter q and has a normal sampling distribution with known standard error sT then is a (1 - a)  100 % confidence interval for q.

  42. Confidence interval for the mean of a Normal population (based on the t statistic). is a (1 - a)  100 % confidence interval for m, the mean of a normal population. Here ta/2 is the upper a/2  100 % percentage point of the Student’s t distribution with n = n-1 degrees of freedom.

  43. More generally if T is an unbiased estimator of the parameter q and has a normal sampling distribution with estmated standard error sT, based on n degrees of freedom, then is a (1 - a)  100 % confidence interval for q.

  44. Multiple Confidence intervals In many situations one is interested in estimating not only a single parameter, q, but a collection of parameters, q1, q2, q3, ... . A collection of intervals, TL1 to TU1, TL2 to TU2, TL3 to TU3, ... are called a set of (1 - a)  100 % multiple confidence intervals if the probability that all the intervals capture their respective parameters is 1 - a

More Related