180 likes | 192 Views
Population and Processes. Population , the entire set of entities about which we make statements in an analysis Incomes of all families living in Milwaukee Votes of all voters in an election Process , an ongoing mechanism from which data are generated
E N D
Population and Processes • Population, the entire set of entities about which we make statements in an analysis • Incomes of all families living in Milwaukee • Votes of all voters in an election • Process, an ongoing mechanism from which data are generated • An attribute of products of a production line • Daily return of a stock
Parameter • A quantity that describes a characteristic of the distribution f(y)of a variable Y of a process variable or a population • distribution mean m • distribution standard deviation s • distribution median, etc. • The value of a parameter is usually unknown • Subject of statistical inference (estimation, testing)
Sample • Data y1, . . ., yn • A subset of a population • A snapshot of a process • Simple Random Sample • Every set of n elements has an equal chance of being selected • Statistical replicas of a draw from f(y) • yi’sare Independently and Identically Distributed (IID)
Statistic • A quantity computed from data • Sample mean • Sample variance • Sample standard deviation
An Example • Given: f(y) normal, sy = 15 • Parameter of interest m, unknown • Data y1=97.65, y2 =101.76, y3 =54.27, y4=99.37 • Use the statistic as an estimate of the unknown parameter m • Compute the statistic
Statistical Inference Population f(y) Unknown Parameter Statistic Sample Estimate
Relative Frequency Inference Statistic All other possible samples f(y) Statistic Unknown Parameter Statistic Actual sample
Variation of Sample mean • The sample mean varies from one sample to another sample • The distribution describes variation of all possible sample mean and is called the sampling distribution of • Parameters of are related to the parameters of the data distribution f(y)
Averages of n rolls of a die Simulation Results Variable NMean StDev y(n = 1 ) 1000 3.5020 1.7008 (n = 4 ) 10003.4800 0.8475 (n = 9 ) 10003.5101 0.5525 (n =16) 10003.4922 0.4235
Mean of the sampling distribution of the sample mean is the same as population distribution meanmy Mean and Standard Deviation of • Standard deviation of the sampling distribution of the sample mean is smaller than the population distribution standard deviationsy
Sampling Distribution of • If f(y) is normal, then is normal P(85< y <115 )=68.3% sy=15 f(y)
Central limit Theorem for • For any f(y) , as n grows, becomes approximately normal f(y)
1000 Incomes my=2, sy=2 P(0 < y < 4) = ?
Bayesian Inference Actual sample observed Unknown Parameter Statistic Prior distribution g(m) g(m | data) Posterior distribution
Bayesian Statistics • Degrees of belief about the values of the unknown parameter m are described by a probability distribution • Prior distribution g(m) • For a given sample, a statistic is a known value and is non-random • A statistic is used for updating the prior knowledge about m • Posterior distribution, g(m|data)
Uniform Prior & Normal Posterior • No prior knowledge, g(m) = Uniform • Posterior distribution g(m|data) g(m)