500 likes | 694 Views
Class 4 ประเภทข้อมูลและการเก็บรวบรวมข้อมูล. บธบ 151 สถิติและระเบียบวิจัยธุรกิจ ภาคเรียนที่ 1 ประจำปีการศึกษา 2556. Learning Objectives. Know the difference between primary and secondary data and their sources Know the advantages and disadvantages of each data collection and sampling method
E N D
Class 4ประเภทข้อมูลและการเก็บรวบรวมข้อมูล บธบ 151 สถิติและระเบียบวิจัยธุรกิจ ภาคเรียนที่ 1 ประจำปีการศึกษา 2556
Learning Objectives • Know the difference between primary and secondary data and their sources • Know the advantages and disadvantages of each data collection and sampling method • Design questionnaires to collect different variables • Evaluate questionnaires
Nominal: classification Ordinal: ranking Interval: equal intervals Ratio: absolute zero Scales of Measurement
Nominal: observations are put into categories based on some criterion Classifies; categorizes Dichotomous Variable: has two values; e.g. male/female, yes/no Multichotomous: has more than two values; e.g. ethnicity, marital status No numerical value (even when observations are numbers) Permissible arithmetic operations: counting Nominal Measurement
Ordinal Measurement • Ordinal: a basic form of quantitative measurement that indicates a numerical order; the intervals between adjacent scale values are undetermined or unequal. • Examples: team/individual standing, socioeconomic status, level of education, Likert scales, any type of rating or ranking • Permissible arithmetic operations: greater than/less than
Interval: intervals between adjacent scale values are equal; scale has an arbitrary zero Hint: If score can go below zero, or if no true zero exists, measurement is interval. Examples: Celsius and Fahrenheit temperature scales, IQ scores, most psychological measures Permissible arithmetic operations: addition, subtraction, multiplication, division; cannot make ratio statements Interval Measurement
Ratio: a measurement scale that has equal units of measurement and a rational zero point for the scale (absolute zero) Hint: An absolute zero indicates a complete absence of the attribute being measured. Examples: Kelvin temperature scale, income in dollars, length, area or volume, height, weight Permissible arithmetic operations: any, including ratios Ratio Measurement
Where do data come from? • Secondary data • data someone else has collected • Primary data • data you collect
Secondary Data • Data gathered by another source (e.g. research study, survey, interview) • Secondary data is gathered BEFORE primary data. WHY? • Because you want to find out what is already known about a subject before you dive into your own investigation. WHY? • Because some of your questions can possibly have been already answered by other investigators or authors. Why “reinvent the wheel”?
Primary Data • Data never gathered before • Advantage: find data you need to suit your purpose • Disadvantage: usually more costly and time consuming than collecting secondary data • Collected after secondary data is collected
Methods of Collecting Data… • There are many methods used to collect or obtain data. Popular methods are: • Direct Observation • Interview •Experiments • Surveys
Observations • Observing behaviors in their settings is one of the most direct ways to collect data. • Observation can range from complete participant observation, where a researcher becomes a member of the group under study to a more detached observation using a casually observing and noting occurrences of specific kinds of behaviors.
Advantages to Observation: • They are free of the biases inherent in the self-report data. • They put a researcher directly in touch with the behaviors in question. • They involved real-time data, describing behavior occurring in the present rather than the past. • They are adapting in that they can be modified depending on what is being observed.
Problems with Observation • Difficulties interpreting the meaning underlying the observations. • Observers must decide which people to observe; choose time periods, territory and events • Failure to attend to these sampling issues can result in a biased sample of data.
Interviews • They permit the interviewer to ask the respondent direct questions. • Further probing and clarification is possible as the interview proceeds. • This flexibility is invaluable for gaining private views and feelings about the organization and exploring new issues that emerge during the interview. • Interviews may be highly structured, resembling questionnaires, or highly unstructured, starting with general questions that allow the respondent to lead the way. • Interviews are usually conducted one-to-one but can be carried out in a group.
Drawback to interviews • They can consume a great deal of time if interviewers take full advantage of the opportunity to hear respondents out and change their questions accordingly. • Personal biases can also distort the data. • The nature of the question and the interactions between the interviewer and the respondent may discourage or encourage certain kinds of responses. • It take considerable skill to gather valid data.
Surveys… • A survey solicits information from people; e.g. Gallup polls; pre-election polls; marketing surveys. • The Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter. • Surveys may be administered in a variety of ways, e.g. • Personal Interview, • Telephone Interview, • Self Administered Questionnaire, and • Internet
Questionnaire Design… • Over the years, a lot of thought has been put into the science of the design of survey questions. Key design principles: • Keep the questionnaire as short as possible. • Ask short, simple, and clearly worded questions. • Start with demographic questions to help respondents get started comfortably. • Use dichotomous (yes|no) and multiple choice questions. • Use open-ended questions cautiously. • Avoid using leading-questions. • Pretest a questionnaire on a small number of people. • Think about the way you intend to use the collected data when preparing the questionnaire.
Experimentation • Experimentation explores cause and effect relationships by manipulating independent variables in order to see if there is a corresponding effect on a dependent variable
Experimentation • Pure experimentation requires both a controlled environment and the use of a randomly assigned control group • This can be difficult to achieve in human centred experiments conducted in the real-world
Real-World Experiments • There are many experiments that can only be carried out in the messy uncontrolled environments of the real-world, so the search for cause and effect will require tradeoffs between real-world contexts and a controlled environment
Sampling… • Recall that statistical inference permits us to draw conclusions about a population based on a sample. • Sampling (i.e. selecting a sub-set of a whole population) is often done for reasons of cost (it’s less expensive to sample 1,000 television viewers than 100 million TV viewers) and practicality (e.g. performing a crash test on every automobile produced is impractical). • In any case, the sampled population and the target population should be similar to one another.
Classification of Sampling Methods Sampling Methods Probability Samples Non- probability Systematic Stratified Convenience Snowball Cluster Simple Random Quota Judgment
Simple Random Sampling… • A government income tax auditor must choose a sample of 5 of 11 returns to audit…[Can do many different ways]
Simple random sampling • Advantages • Simple • Sampling error easily measured • Disadvantages • Need complete list of units • Units may be scattered and poorly accessible • Heterogeneous population important minorities might not be taken into account
Systematic Random Sampling… • Select sampling units at regular intervals (e.g. every 20th unit)
Systematic sampling • Advantages • Ensures representativity across list • Easy to implement • Disadvantages • Need complete list of units • Periodicity-underlying pattern may be a problem (characteristics occurring at regular intervals)
Stratified Random Sampling… • A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum. Strata 2 : Age < 20 20-30 31-40 41-50 51-60 > 60 Strata 3 : Occupation professional clerical blue collar other Strata 1 : Gender Male Female We can acquire about the total population, make inferences within a stratum or make comparisons across strata
Stratified Random Sampling… • After the population has been stratified, we can use simple random sampling to generate the complete sample: If we only have sufficient resources to sample 400 people total, we would draw 100 of them from the low income group… …if we are sampling 1000 people, we’d draw 50 of them from the high income group.
Stratified sampling • Advantages • Can acquire information about whole population and individual strata • Precision increased if variability within strata is smaller (homogenous) than between strata • Disadvantages • Sampling error is difficult to measure • Different strata can be difficult to identify • Loss of precision if small numbers in individual strata (resolved by sampling proportional to stratum population)
Cluster Sampling… • A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects). • This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically. Used more in the “old days”. • Cluster sampling may increase sampling error due to similarities among cluster members.
Cluster sampling • Advantages • Simple as complete list of sampling units within population not required • Less travel/resources required • Disadvantages • Cluster members may be more alike than those in another cluster (homogeneous) • This needs to be taken into account in the sample size and in the analysis (“design effect”)
Sampling and Non-Sampling Errors… • Two major types of error can arise when a sample of observations is taken from a population: • sampling error and nonsampling error. • Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. Random and we have no control over. • Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly. Most likely caused be poor planning, sloppy work, act of the Goddess of Statistics, etc.
Sampling Error… • Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. • Increasing the sample size will reduce this type of error.
Nonsampling Error… • Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly. Three types of nonsampling errors: • Errors in data acquisition, • Nonresponse errors, and • Selection bias. • Note: increasing the sample size will not reduce this type of error.
Errors in data acquisition… …arises from the recording of incorrect responses, due to: — incorrect measurements being taken because of faulty equipment, — mistakes made during transcription from primary sources, — inaccurate recording of data due to misinterpretation of terms, or — inaccurate responses to questions concerning sensitive issues.
Nonresponse Error… • …refers to error (or bias) introduced when responses are not obtained from some members of the sample, i.e. the sample observations that are collected may not be representative of the target population. • As mentioned earlier, the Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter and helps in the understanding in the validity of the survey and sources of nonresponse error.
Type 1 error • The probability of finding a difference with our sample compared to population, and there really isn’t one…. • Known as the α (or “type 1 error”) • Usually set at 5% (or 0.05)
Type 2 error • The probability of not finding a difference that actually exists between our sample compared to the population… • Known as the β (or “type 2 error”) • Power is (1- β) and is usually 80%
Sample size Quantitative Qualitative
Problem 1 A study is to be performed to determine a certain parameter in a community. From a previous study a sd of 46 was obtained. If a sample error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence?
Problem 2 • A study is to be done to determine effect of 2 drugs (A and B) on blood glucose level. From previous studies using those drugs, Sd of BGL of 8 and 12 g/dl were obtained respectively. • A significant level of 95% and a power of 90% is required to detect a mean difference between the two groups of 3 g/dl. How many subjects should be include in each group?
Problem 3 It was desired to estimate proportion of anaemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population.
Problem 4 In previous studies, percentage of hypertensives among Diabetics was 70% and among non diabetics was 40% in a certain community. A researcher wants to perform a comparative study for hypertension among diabetics and non-diabetics at a confidence limit 95% and power 80%, What is the minimal sample to be taken from each group with 4% accepted difference of true value?