110 likes | 119 Views
Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING. Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change impacts workshop Accra, Ghana.
E N D
Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change impacts workshop Accra, Ghana
Database organisation and cleaning, or data management is generally seen as a set of tasks related to the tabulation phase of the survey, in other words, activities that are conducted towards the end of the survey project, that use computers in clean offices. • Survey data management should begin concurrently with questionnaire design. Keys points to consider: • Nature and identification of the statistical units observed • Built-in redundancies • Length and complexity of the questionnaire • Sample size and design • Survey timing and scheduling
DATA ENTRY SYSTEM • A complex household survey typically contains hundreds of variables. For example household survey dataset 2003 GEF study : 1342 variables • After the survey instrument has been finalized, you develop the data entry system and provide a protocol for data entry. • Coding questionnaire • Coding sheet • Household data: 12 worksheets • Climate data; soil data, runoff data
Data cleaning • Generally data is subjected to control mechanisms: • range checks, • consistency checks and • typographical checks
Range checks Every variable in the survey contains only data within a limited domain of valid values. tab farmtype, missing farmtype | Freq. Percent Cum. ------------+----------------------------------- -99 | 4 0.99 0.99 1 | 191 47.16 48.15 2 | 71 17.53 65.68 3 | 138 34.07 99.75 9 | 1 0.25 100.00 ------------+----------------------------------- Total | 405 100.00 hhcodefarmtype remark 39. 70013308 9 CHECK DATA FOR THIS OBS.
Consistency check Values from one question are consistent with values from another question. • Demographic consistency of the household • Consistency of age and other individual characteristics gen test=hhmales+hhfemales list hhcodehhsizehhmaleshhfemales test remark if test!=hhsize, hhcodehhsizehhmaleshhfemales test remark 70013319 18 3 3 6 CHECK DATA FOR THIS OBS 70030507 14 4 4 8 CHECK DATA FOR THIS OBS. tab age5 hhcode age5 remark 70041703 281 CHECK DATA FOR THIS OBS.
Typographical checks • Typographical error consists in the transposition of digits like entering : 41 rather than 14 This error can be check through the double data entry of all questionnaires -999 rather than .-99 in a numerical input foreachvar of varlist _all { replace `var'=-99 if `var'==-999 replace `var'=. if `var'==-99 } Use the tab function to obtain frequency tables of the data