390 likes | 760 Views
Research Methodology. Lecture No : 21 Data Preparation and Data Entry. Recap Lecture. In the last few lectures we discussed about: Research Design The purpose, investigation type, researcher interference, study setting, unit of analysis, time horizon, Measurement of variables
E N D
Research Methodology Lecture No : 21 Data Preparation and Data Entry
Recap Lecture In the last few lectures we discussed about: • Research Design • The purpose, investigation type, researcher interference, study setting, unit of analysis, time horizon, Measurement of variables • Sources of Data • Sampling • Experimental Design
Lecture Objectives Getting the data ready for analysis • Data preparation • Coding, codebook, pre-coding, coding rules • Data entry • Editing data • Data transformation
Data Preparation and Description • Data preparation includes editing, coding, and data entry • It is the activity that ensures the accuracy of the data and their conversion from raw form to reduced and classified forms that are more appropriate for analysis. • Preparing a descriptive statistic summary is another preliminary step that allows data entry errors to be identified and corrected.
Getting the Data Ready for Analysis • After data obtained through questionnaire, they need to be coded, keyed in, and edited. • Outliers, inconsistencies and blank responses, if any, have to be handled in some way.
Coding • Data coding involves assigning a number to the participants responses so, they can be entered into data base. • In coding, categories are the partitions of a data set of a given variable. For instance, if the variable is gender, the categories are male and female. • Categorization is the process of using rules to partition a body of data. • Both closed and open questions must be coded.
Coding Cont. • Numeric coding simplifies the researcher’s task in converting a nominal variable like gender to a 1 or 2.
Code Construction There are two basic rules for code construction. • First, the coding categories should be exhaustive, meaning that a coding category should exist for all possible responses. • For example, household size might be coded 1, 2, 3, 4, and 5 or more. • The “5 or more” category assures all subjects of a place in a category.
Code Construction Cont. • Second, the coding categories should be mutually exclusive and independent. • This means that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.
Code Construction Cont. • Missing data should also be represented with a code. • In the “good old days” of computer cards, a numeric value such as 9 or 99 was used to represent missing data. • Today, most software will understand that either a period or a blank response represents missing data.
Codebook • A codebook contains each variable in the study and specifies the application of coding rules to the variable. • It is used by the researcher or research staff to promote more accurate and more efficient data entry. • It is the definitive source for locating the positions of variables in the data file during analysis.
Pre-coding • Pre-coding means assigning codebook codes to variables in a study and recording them on the questionnaire. • Or you could design the questionnaire in such a way that apart from the respondents choice it also indicates the appropriate code next to it. • With a pre-coded instrument, the codes for variable categories are accessible directly from the questionnaire.
Coding Open-Ended Questions • One of the primary reasons for using open-ended questions is that insufficient information or lack of a hypothesis may prohibit preparing response categories in advance. Researchers are forced to categorize responses after the data are collected.
Coding Open-Ended Questions Cont. • In the Figure on the next slide, question 6 illustrates the use of an open-ended question. After preliminary evaluation, response categories were created for that item. They can be seen in the codebook.
Coding Rules Exhaustive Appropriate to the research problem Categories should be Mutually exclusive Derived from one classification principle
Data Entry • After responses have been coded, they can be entered into data base. • Raw data can be entered through any software program. • For example: SPSS Data Editor.
Keyboarding Database Programs Digital/ Barcodes Optical Recognition Voice recognition Data Entry Cont.
Editing Data • After data entered, the blank responses, if any, have to be handled in some way, and inconsistent data have to be checked and followed up. • Data editing deals with detecting and correctingillogical,inconsistent, or illegal data and omissions in the information returned by the participants of study.
Accurate Consistent Criteria Arranged for simplification Uniformly entered Complete Editing Data Cont.
Field Editing • Field Editing Review • Entry Gaps Callback • Validates Re-interviewing
Field Editing Review • In large projects, field editing review is a responsibility of the field supervisor. • It should be done soon after the data have been collected. • During the stress of data collection, data collectors often use ad hoc abbreviations and special symbols.
If the forms are not completed soon, the field interviewer may not recall what the respondent said. • Therefore, reporting forms should be reviewed regularly.
Field Editing Cont. • Entry Gaps Callback • When entry gaps are present, a callback should be made rather than guessing what the respondent probably said.
Field Editing Cont. • Validates Re-interviewing • The field supervisor also validates field results by re-interviewing some percentage of the respondents on some questions to verify that they have participated. • Ten percent is the typical amount used in data validation.
Central Editing • Scale of Study Number of Editors • At this point, the data should get a thorough editing. • For a small study, a single editor will produce maximum consistency. • For large studies, editing tasks should be allocated by sections.
Central Editing Cont. • Wrong Entry Replacements • Sometimes it is obvious that an entry is incorrect and the editor may be able to detect the proper answer by reviewing other information in the data set. • This should only be done when the correct answer is obvious. • If an answer given is inappropriate, the editor can replace it with a no answer or unknown.
Central Editing Cont. • Fakery Open-ended Questions • The editor can also detect instances of armchair interviewing, fake interviews, during this phase. • This is easiest to spot with open-ended questions.
Central Editing Cont. Guidelines for Editors Be familiar with instructions given to interviewers and coders Do not destroy the original entry Make all editing entries identifiable and in standardized form Initial all answers changed or supplied Place initials and date of editing on each instrument completed
Handling “Don’t Know” Responses • When the number of “don’t know” (DK) responses is low, it is not a problem. However, if there are several given, it may mean that the question was poorly designed, too sensitive, or too challenging for the respondent. • The best way to deal with undesired DK answers is to design better questions at the beginning. • If DK response is legitimate, it should be kept as a separate reply category.
Data Transformation • Data transformation, a variation of data coding, is a process of changing the original numerical representation of a quantitative value to another value. • E.g: The data given is in per year consumption and we need it for each month. • Data are typically changed to avoid problems in the next stage of data analysis process.
Data Transformation Cont. • For example, economists often use a logarithmic transformation so that the data are more evenly distributed. • Data transformation is also necessary when several questions have been used to measure a single concept. • E.g: Intentions to leave is measured through 10 questions which need to be transformed into a single value for a single respondent
Recap • Questionnaire checking involves eliminating unacceptable questionnaires. • These questionnaires may be incomplete, instructions not followed, missing pages, past cutoff date or respondent not qualified. • Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers. • Coding typically assigns alpha or numeric codes to answers that do not already have them so that statistical techniques can be applied.
Recap Cont. • Cleaning reviews data for consistencies. Inconsistencies may arise from faulty logic, out of range or extreme values. • Statistical adjustments applies to data that requires weighting and scale transformations.