1.1k likes | 1.31k Views
Statistics. Sampling. STATISTICS in PRACTICE. Cinergy, formerly Cincinnati Gas & Electric Company (CG&E ), is a public utility that provides gas and electric power to customers in the Greater Cincinnati area.
E N D
Statistics Sampling
STATISTICSin PRACTICE • Cinergy, formerly Cincinnati Gas & Electric Company (CG&E), is a public utility that provides gas and electric power to customers in the Greater Cincinnati area. • To improve service to its customers, Cinergy continually strives to stay up-to-date with its customers’ needs.
STATISTICSin PRACTICE • Cinergy is using the survey results to improve the forecasts of energy demand and to improve service to its commercial customers.
Contents • Terminology Used in Sample Surveys • Types of Surveys and Sampling Methods • Survey Errors • Simple Random Sampling • Stratified Simple Random Sampling • Cluster Sampling • Systematic Sampling
Terminology Used in Sample Surveys • An element is the entity on which data are collected. • A population is the collection of all elements of interest. • A sample is a subset of the population. • The target populationis the population we want to make inferences about.
Terminology Used in Sample Surveys • The sampled populationis the population from which the sample is actually selected. • These two populations are not always the same. • If inferences from a sample are to be valid, the sampled population must be representative of the target population.
Terminology Used in Sample Surveys • Example: Dunning Microsystems, Inc. (DMI), a manufacturer of personal computers and peripherals, would like to collect data about the characteristics of individuals who purchased a DMI personal computer. • A sample survey of DMI personal computer owners could be conducted.
Terminology Used in Sample Surveys • The elements in this sample survey would be individuals who purchased a DMI personal computer. • The population would be the collection of all people who purchased a DMI personal computer. • The sample would be the subset of DMI personal computer owners who are surveyed.
Terminology Used in Sample Surveys • The target population consists of all people who purchased a DMI personal computer. • The sampled population, however, might be all owners who sent warranty registration cards back to DMI. • Not every person who buys a DMI personal computer sends in the warranty card, so the sampled population would differ from the target population.
Terminology Used in Sample Surveys • The population is divided into sampling unitswhich are groups of elements or the elements themselves. • A list of the sampling units for a particular study is called a frame.
Terminology Used in Sample Surveys • The choice of a particular frame is often determined by the availability and reliability of a list. • The development of a frame can be the most difficult and important steps in conducting a sample survey.
Terminology Used in Sample Surveys • Example: suppose we want to survey certified professional engineers who are involved in the design of heating and air conditioning systems for commercial buildings • If a list of all professional engineers were available, the sampling units would be the professional engineers we want to survey.
Terminology Used in Sample Surveys • If such a list is NOT available, a business telephone directory might provide a list of all engineering firms. • we could select a sample of the engineering firms to survey; then, for each firm surveyed, we might interview all the professional engineers.
Types of Surveys • Surveys Involving Questionnaires • Three common types are mail surveys, telephone surveys, and personal interview surveys. • Survey costs are lower for mail and telephone surveys. • With well-trained interviewers, higher response rates and longer questionnaires are possible with personal interviews. • The design of the questionnaire is critical.
Types of Surveys • Surveys Not Involving Questionnaires • Often, someone simply counts or measures the sampled items and records the results. • An example is sampling a company’s inventory of parts to estimate the total inventory value.
Sampling Methods • Nonprobabilistic Sampling • Probabilistic Sampling
Non-probabilistic Sampling Methods • The probability of obtaining each possible sample to be computed. • Statistically valid statements cannot be made about precision of the estimates. • Sampling cost is lower and implementation is easier • Methods include convenience and judgment sampling.
Non-probabilistic Sampling Methods • Convenience Sampling The units included in the sample are chosen because of accessibility. In some cases, convenience sampling is the only practical approach.
Non-probabilistic Sampling Methods • Judgment Sampling A knowledgeable person selects sampling units that he/she feels are most representative of the population. The quality of the result is dependent on the person selecting the sample. Generally, no statistical statement should be made about the precision of the result.
Nonprobabilistic Sampling Methods • Example • Convenience sampling: professor conducting a research study at a university may ask student volunteers to participate in the study simply because they are in the professor’s class.
Probabilistic Sampling Methods The probability of obtaining each possible sample can be computed. Confidence intervals can be developed which provide bounds on the sampling error. Methods include simple random, stratified simple random, cluster, and systematic sampling.
Survey Errors Two types of errors can occur in conducting a survey : Sampling error Nonsampling error
Survey Errors • Sampling Error It is defined as the magnitude of the difference between the point estimate, developed from the sample, and the population parameter. It occurs because not every element in the population is surveyed. It cannot occur in a census. It can not be avoided, but it can be controlled.
Survey Errors • Nonsampling Error It can occur in both a census and a sample survey. Examples include: Measurement error Errors due to nonresponse Errors due to lack of respondent knowledge Selection error Processing error
Survey Errors • Nonsampling Error • Measurement Error Measuring instruments are not properly calibrated. People taking the measurements are not properly trained.
Survey Errors • Nonsampling Error • Errors Due to Nonresponse They occur when no data can be obtained, or only partial data are obtained, for some of the units surveyed. The problem is most serious when a bias is created.
Survey Errors • Nonsampling Error • Errors Due to Lack of Respondent Knowledge These errors on common in technical surveys. Some respondents might be more capable than others of answering technical questions.
Survey Errors • Nonsampling Error • Selection Error An inappropriate item is included in the survey. For example, in a survey of “small truck owners” some interviewers include SUV owners while other interviewers do not.
Survey Errors • Nonsampling Error • Processing Error Data is incorrectly recorded. Data is incorrectly transferred from recording forms to computer files.
Simple Random Sampling A simple random sampleof size n from a finite population of size N is a sample selected such that every possible sample of size n has the same probability of being selected. We begin by developing a frameor list of all elements in the population. Then a selection procedure, based on the use of random numbers, is used to ensure that each element in the sampled population has the same probability of being selected.
Simple Random Sampling We will see in the upcoming slides how to: • Estimate the following population parameters: • Population mean • Population total • Population proportion • Determine the appropriate sample size
Simple Random Sampling In a sample survey it is common practice to provide an approximate 95% confidence interval estimate of the population parameter. Assuming the sampling distribution of the point estimator can be approximated by a normal probability distribution, we use a value of t = 2 for a 95% confidence interval.
Simple Random Sampling • The interval estimate is: Point Estimator +/- 2(Estimate of the Standard Error of the Point Estimator) • The bound on the sampling error is: 2(Estimate of the Standard Error of the Point Estimator)
Simple Random Sampling • Population Mean • Point Estimator • Estimate of the Standard Error of the Mean
Simple Random Sampling • Population Mean • Interval Estimate • Approximate 95% Confidence Interval Estimate
Simple Random Sampling • Estimate of the Standard Error of the Total • Population Total • Point Estimator
Simple Random Sampling • Population Total • Interval Estimate • Approximate 95% Confidence Interval Estimate
Simple Random Sampling • Population Proportion • Point Estimator • Estimate of the Standard Error of the Proportion
Simple Random Sampling • Population Proportion • Interval Estimate • Approximate 95% Confidence Interval Estimate
Determining the Sample Size An important consideration is choice of sample size. The best choice usually involves a tradeoff between cost and precision (size of the confidence interval). Larger samples provide greater precision, but are more costly. A budget might dictate how large the sample can be. A specified level of precision might dictate how small a sample can be.
Simple Random Sampling Smaller confidence intervals provide more precision. The size of the approximate confidence interval depends on the bound B on the sampling error. Choosing a level of precision amounts to choosing a value for B. Given a desired level of precision, we can solve for the value of n.
Simple Random Sampling • Necessary Sample Size for Estimating the Population Mean Hence,
Simple Random Sampling • Necessary Sample Size for Estimating the Population Total
Simple Random Sampling • Necessary Sample Size for Estimating the Population Proportion
Simple Random Sampling • Example: Steddy Investments Ben Steddy is a financial advisor for 200 clients. A sample of 40 clients has been taken to obtain various demo- graphic data and information about the clients’ investment objectives. Statistics of partic- ular interest are the clients’ net worth and the proportion favoring fixed income investments.
Simple Random Sampling • Example: Steddy Investments For the sample, the mean net worth was $480,000 (with a standard deviation of $120,000), and the proportion favoring fixed-income invest- ments was .30.
$ = = = 200 ( 480 ) 96 , 000 thousand = $96,000,0 00 X N x Simple Random Sampling • Point Estimate of Total Net Worth (TNW) • Approximate 95% Confidence Interval for TNW $ = = = 200 ( 480 ) 96 , 000 thousand = $96,000,0 00 X N x • Estimate of Standard Error of TNW = $3,394,113 = $89,211,774 to $102,788,226
Simple Random Sampling • Point Estimate of Population Proportion Favoring Fixed-Income Investments • Estimate of Standard Error of Proportion • Approximate 95% Confidence Interval
Simple Random Sampling • Example: Steddy Investments One year later Steddy wants to again survey his clients. He now has 250 clients and wants to set a bound of $30,000 on the error of the estimate of their mean net worth. What is the necessary sample size?
Simple Random Sampling • Necessary Sample Size Steddy will need a sample size of 51.