380 likes | 598 Views
Chapter 1. Sampling and Data. What Is (Are?) Statistics?. Statistics (a discipline) is a science of dealing with data. It consists of tools and methods to collect data, organize data, and interpret the information or draw conclusion from data.
E N D
Chapter 1 Sampling and Data
What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods to collect data, organize data, and interpret the information or draw conclusion from data. Note: Statistics (plural) sometimes are referred to particular calculations made from data. For instance, mean, median, percentage etc. are statistics, since these are numbers calculated from a set of sample data collected.
Basic Terms • Population:A collection, or set, of individuals or objects or events whose properties are to be analyzed. • Sample:A subset of the population. • Parameter:A numerical value summarizing all the data of an entire population, for instance, a population mean. • Statistic: A numerical value summarizing the sample data, for instance, a sample mean.
Two Areas of Statistics Two areas of statistics: • Descriptive Statistics: collection, presentation, and description of sample data. • Inferential Statistics: making decisions and drawing conclusions about populations.
What is a Variable? • Variables are characteristics recorded about each individual or thing. • The variables should have a name that identify What has been measured.
What is an Observational Unit? The person or thing to which the variable is observed or measured, such as a student in the class, is called the observational/experimental unit or simply a case .
What Are Data? • Data can be numbers, record names, or other labels recorded for the observational unit. • Not all data represented by numbers are numerical data (e.g., 1=male, 2=female where 1 and 2 are the indicators of gender).
Data Tables • The following data tableclearly shows the context of the data presented: • Notice that this data table tells us the variables (column) and observational units (row) for these data.
What is Statistics Really About? Statistics is about variation. Different observational units may have different data values for a variable. Statistics helps us to deal with variation in order to make sense of data.
Two kinds of Variables • Qualitative, or Attribute, or Categorical, Variable: A variable that identifies a categories for each case, for example, gender. Note: Arithmetic operations, such as addition and averaging, are not meaningful for data resulting from a qualitative variable • Quantitative, or Numerical, Variable: A variable that records measurements or amounts of something and must have measuring units, for example, height measured in inches. Note: Arithmetic operations such as addition and averaging, are meaningful for data resulting from a quantitative variable
Subdividing Variables Further • Qualitativeand quantitative variables may be further subdivided: Nominal Qualitative Ordinal Variable Discrete Quantitative Continuous
Key Definitions • Nominal Variable: A qualitative variable that categorizes (or describes, or names) an element of a population, for example, color of a car purchased. • Ordinal Variable: A qualitative variable that incorporates an ordered position, or ranking, for instance, The variable Age is recorded as young, middle, and old three possible categories of values. • Discrete Variable: A quantitative variable that can assume a countable number of values. That is, the values are the counts, for example, number of cars owned. So, a discrete variable can assume values corresponding to integer values along a number line. • Continuous Variable: A quantitative variable that are measurements such as height, weight etc. The precision of the values recorded for the variable depends on the measuring scales used. Therefore, a weight of 120 lbs recorded may actually be 120.1 lbs or 120.14 lb or 120.143 lb etc. if a more accurate scale is used for measuring. Therefore, a continuous variable can assume any interval value along a number line, including every possible value between any two values.
Important Reminders! • In many cases, a discrete and continuous variable may be distinguished by determining whether the variables are related to a count or a measurement. • Discrete variables are usually associated with counting. • Continuous variables are usually associated with measurements.
Example • Example: In a student evaluation of instruction at a large university, one question asks students to evaluate the statement “The instructor was generally interested in teaching” on the following scale: 1 = Disagree Strongly; 2 = Disagree; 3 = Neutral; 4 = Agree; 5 = Agree Strongly. • Question: Is interest in teaching categorical or quantitative?
Example (cont.) • Question: Is interest in teaching categorical or quantitative? • Since there is an order to these ratings, but there are no meaning by adding or subtracting two ratings. • We conclude that variables like interest in teaching are categorical and are ordinal variables. Just because your variable’s values are numbers, don’t assume that it’s quantitative.
Data Collection • First problem a statistician faces: how to obtain the data. • Usually the data are sample data collected from a portion of the population. It is important to obtain good or representative sample data. • Statistical Inferences to the population are made based on statistics obtained from the sample data collected.
Sampling methods that often result in biased samples: • Convenience sample: sample selected from elements of a population that are easily accessible • Volunteer sample: sample collected from those elements of the population which chose to contribute the needed information on their own initiative Biased Sampling Biased Sampling Method: A sampling method that produces data which systematically differs from the sampled population An unbiased sampling method is one that is not biased
Process of Data Collection 1.Define the objectives of the survey or experiment • Example: Estimate the average length of time for anesthesia to wear off 2. Define the variable and population of interest • Example: Length of time for anesthesia to wear off after surgery 3. Defining the data-collection and data-measuring schemes. This includes sampling procedures, sample size, and the data-measuring device (questionnaire, scale, ruler, etc.) 4.Determine the appropriate descriptive or inferential data-analysis techniques
Methods Used to Collect Data Data can be collected through performing an Experiment or survey or census: Experiment:The investigator controls or modifies the environment and observes the effect on the variable under study Survey:Data are obtained by sampling some of the population of interest. The investigator does not modify the environment. Census: A 100% survey. Every element of the population is listed. Seldom used: difficult and time-consuming to compile, and expensive.
Sampling Frame: A list of the elements belonging to the population from which the sample will be drawn Note: It is important that the sampling frame be representative of the population Sample Design: The process of selecting sample elements from the sampling frame Note: There are many different types of sample designs. Usually they all fit into two categories: judgment samples and probability samples.
Two types of sample designs Judgment Samples: Samples that are selected on the basis of being “typical” • Items are selected that are representative of the population. The validity of the results from a judgment sample reflects the soundness of the collector’s judgment. Probability Samples: Samples in which the elements to be selected are drawn on the basis of probability. Each element in a population has a certain probability of being selected as part of the sample.
Probability Sampling Probability sampling includes random sampling, systematic sampling, stratified sampling, proportional sampling, and cluster sampling.
Notes: • Inherent in the concept of randomness: the next result(or occurrence) is not predictable • Proper procedure for selecting a random sample: use a random number generator or a table of random numbers Random Sampling Random Samples: A sample selectedin such a way that every element in the population has a equal probability of being chosen. Equivalently, all samples of size n have an equal chance of being selected. Random samples are obtained either by sampling with replacement from a finite population or by sampling without replacement from an infinite population.
1. There are 2712 employees 2. Each employee is numbered: 0001, 0002, 0003, etc., up to 2712 3. Using four-digit random numbers, a sample is identified: 1315, 0987, 1125, etc. Example • Example: An employer is interested in the time it takes each employee to commute to work each morning. A random sample of 35 employees will be selected and their commuting time will be recorded.
Systematic Sampling Systematic Sample: A sample in which every kth item of the sampling frame is selected, starting from the first element which is randomly selected from the first k elements Note: The systematic technique is easy to execute. However,it has some inherent dangers when the sampling frame is repetitive or cyclical in nature. In these situations the results may not approximate a simple random sample.
Example Suppose you want to obtain a systematic sample of 8 houses from a street of120 houses., so • First, since 120/8=15, choose a random starting pointbetween 1 and 15.Let’s say, 11. • Then, choose every 15th house after the 11th house. The list of houses selected are 11,26, 41, 56, 71, 86, 101, and 116.
Strartified Sampling Stratified Random Sample: A sample obtained by stratifying or grouping the sampling frame and then selecting a fixed number of items from each of the strata/groups by means of a simple random sampling technique.
Proportional Sampling Proportional Sample (or Quota Sample): A sample obtained by stratifying the sampling frame and then selecting a number of items in proportion to the size of the strata (or by quota) from each strata by means of a simple random sampling technique
Suppose that in a company there are 180 staff include: we are asked to take a proportional sample of 40 staff, stratified according to the above categories. The first step is tocalculate the percentage of staff in each group: % male, full time = (90/180)x 100 = 0.5 x 100 = 50% male, part time = (18/180)x100 = 0.1 x 100 = 10% female, full time = (9/180) x 100 = 0.05 x 100 = 5% female, part time = (63/180)x100 = 0.35 x 100 = 35 This tells us that of our sample of 40,50% should be male, full time. 10% should be male, part time. 5% should be female, full time. 35% should be female, part time. Therefore, 50% of 40 is 20. 10% of 40 is 4. 5% of 40 is 2.35% of 40 is 14. We need to select 20 full time males, 4 part time males, 5 full time females, and 35 part time females. Example
Cluster Sampling Cluster Sample: A sample obtained by stratifying the sampling frame into clusters first and then randomly selecting some clusters. Finally, the sample will include either all elements or a simple random sample of some of the elements in each of the clusters selected. Note: The difference between strata and cluster samplings: All strata are represented in the sample; but only a subset of clusters are in the sample.
Guideline for Planning a Statistical Study • Determine the variables and methods of measuring. • Decide to collect Identify the individuals or objects involved. • data from an entire population or a sample. If using a sample, decide on a sampling method. • Address issues of ethics, privacy, and confidentiality in planning for data collection. • Collect data. • Apply descriptive statistics (Chapters 1, 2, 3) methods and make conclusion using appropriate inferential statistics methods (Chapters 9, 10, 11) from the data collected. • Discussions and recommendations for future studies.
Probability & Statistics • Probability is the science of making statement about what will occur when samples are drawn from a known population. • Statistics is the science of organizing a sample data and making inferences about the unknown population from which the sample is drawn. Probability (Chapters 4, 5, 6, 7, 8) is an vehicle of statistics so that the accuracy of statistical inferences from a sample data to a population can be justified with its chance of occurring. That is, we want to know the chance a similar result will occur, if the study is repeated many more times.
Comparison of Probability & Statistics Probability: Properties of the population areassumed known. Answer questions about the sample based on these properties. Statistics: Use information in the sample to draw a conclusion about the population
Example • Example: A jar of M&M’s contains 100 candy pieces, 15 are red. A handful of 10 is selected. Probability question: What is the probability that 3 of the 10 selected are red? • Example: A handful of 10 M&M’s is selected from a jar containing 1000 candy pieces. Three M&M’s in the handful are red. Statistics question: What is the proportion of red M&M’s in the entire jar?