150 likes | 313 Views
Chaper 3. Some basic concepts of statistics. Population versus Sample. Population. Sample. Numbers that describe the sample are called __________________ Sample mean is represented by ________ Sample variance is represented by ________.
E N D
Chaper 3 Some basic concepts of statistics
Population versus Sample Population Sample Numbers that describe the sample are called __________________ Sample mean is represented by ________ Sample variance is represented by ________ • Numbers that describe the population are called _________________ • Population mean is represented by ________ • Population variance is represented by ________
Sample mean and variance • Calculate sample mean: • Calculate sample variance: • Sample standard deviation:
Population Mean and Standard deviation • m = E(Y) = Syp(y) • Population standard deviation: s2= S(y-m)2p(y)
Sampling distribution • The distribution of all y-bars possible with n=50. • E(y-bar)= m • Var(y-bar)= s2/n
Section 3.3 Summarizing Information in Populations and Samples: The Finite Population Case • If the population is infinitely large, we can assume sampling without replacement (probabilities of selecting observations are independent) • However, if population is finite, then probabilities of selecting elements will change as more elements are selected (Example: rolling a die versus selecting cards from standard 52 card deck)
Estimating total population • (Infinitely large population) Let t denote the population total (parameter) and let t-hat denote the estimated total (statistics); let y1,…yn be a random sample of size n from the population and let d1,…dn be the probabilities for being selected for each of the sample observations, respectively. Then the estimated population total is t-hat = (1/n)Si(yi/di) (estimator is unbiased for true parameter t Can think of this estimator as a weighted estimator with weights wi = 1/di so t-hat = (1/n)Si(wiyi)) The estimated variance of t-hat is (1/n)*(1/(n-1))*Si((yi/di)-t)2. Calculate the estimated variance for each scenario.
Sampling without replacement • Same idea can be used with sampling without replacement, but probabilities become more difficult to find (STT 315 helps to understand how to calculate these).
3.4 Sampling distribution • In your introductory statistics class, you discovered that the sampling distribution of y-bar was normally distributed (if n was large enough) with mean m and standard deviation s/sqrt(n).
Tchebysheff’s theorem • If n is NOT large enough to assume CLT and the population distribution is NOT normal, then we can still use Tchebysheff’s theorem to get a lower bound: For any k > 1, at least (1-(1/k2)) will fall within k standard deviations of the mean (this is a LOWER BOUND!!) . Therefore, within 1 standard deviation, at least 0% (not very useful); within 2 standard deviations, at least 75%; within 3, at least 88.88889%
Finite population size • All the theory in introductory statistics class (and so far in this class) assumes INDEPENDENT observations (infinite population…..or so large that we can assume infinite population) • What happens when this is not true? R-code: x<-rgamma(80,shape=0.5,scale=9) hist(x) x.bar.dist<-function(x,n) {xbar<-vector(length=100) for (i in 1:100) { temp<-sample(x,n,replace=FALSE) xbar[i]<-mean(temp) } return(xbar)}
3.5 Covariance and Correlation • Relationship between two random variables: covariance • The covariance indicates how two variables “covary” • Positive covariance indicates a positive “covary” or association • Negative covariance indicates a negative “covary” or association • Zero covariance indicates no association (NOT necessarily independence!!!)
More on Covariance • We calculate covariance by E[(y1-m1)(y2-m2)]. • Look at graphs to discuss covariance (measure of LINEAR dependency) • However, covariance depends on the scale of the two variables • Correlation “standardizes” the covariance • Correlation = cov(y1,y2)/(s1s2) = r • Note that -1<r<1
3.6 Estimation • Since we do not know parameters, we estimate them with statistics!! If q is the parameter of interest, then q-hat is the estimator of q. We want the following properties to hold: • E(q-hat) = q • V(q-hat) = s2(q-hat) is small
Error of Estimations and Bounds • The error of estimation is defined as |(q-hat)-q| • Set a bound on this error of estimation (B) such that P(|(q-hat)-q| < B) = 1-a The value of B (bound) can be thought of as the margin of error. In fact, this is how confidence intervals (when the sampling distribution of the statistics is normally distributed).