160 likes | 373 Views
Planning a Data Entry Operation. Creating the Application. The data entry application can be designed by one or by multiple people It is advisable that one person or a team work on the dictionary together
E N D
Creating the Application • The data entry application can be designed by one or by multiple people • It is advisable that one person or a team work on the dictionary together • After the dictionary is final, people can work on different forms independently, which will then be copied-and-pasted together for the final product • Make sure to backup the application data files frequently; not only might you make an unrecoverable mistake to your files, but CSPro has been known to (very rarely) render applications unusable
Operator and System Controlled Modes • CSPro has two modes of data entry, which come about due to the differences in CSPro’s parent software packages • Operator-controlled mode: as in IMPS, which was designed for census data entry, where speed of data entry is sometimes prioritized over accuracy, and where the sheer volume of keying means that office editing of questionnaires may not be possible (heads-down keying) • System-controlled mode: as in ISSA, which was designed for survey data entry, where accuracy is critical, and where office editing is often possible (heads-up keying)
Operator and System Controlled Modes (continued) • Graphic borrowed from Macro International: Operator-Controlled Mode System-Controlled Mode
Operator-Controlled Mode • In operator-controlled mode, the keyer can use the mouse to move around the questionnaire, bypassing fields or whole sections of the data entry application • The mouse can also be used to skip to fields after having keyed in an invalid response for a value • Mouse action can cause havoc, but it can also make the keyed data more true to the data on the questionnaire, as it eliminates the need for office editing (though the data will have to be edited later) • Keyers generally like this mode, though programmers are often reluctant to give so much control to the keyers
System-Controlled Mode • System-controlled mode ensures that keyed data comes in a format that the programmer has specified, with skip patterns obeyed and all consistency checks passed • CSPro keeps track of the “path” of data entry, so that going backwards in the questionnaire faithfully returns to the previous fields keyed, which may not be the previous fields on a form in the case of skips • Requires that keyers resolve all errors before moving on in the questionnaire, which can slow down progress • A mistake in the programming of the application can ruin the integrity of the data file • Unless consistent office editing rules are followed, system-controlled mode can introduce various biases in the data file
Skips • In operator-controlled mode, skips can be achieved in three ways: • Programmatic logic • Manual skips associated with fields and activated by the keyer pressing the + key • Use of the mouse to navigate the questionnaire • In system-controlled mode, all skips must be programmed using logic
Order of Entry • Before designing the data entry application, consider the best way for keyers to enter data from the paper questionnaires, particularly when the questionnaire has multiple modules • For example, a questionnaire has a household module (with a population roster) and then several modules for each person aged between 12 and 15 • Options: • Do you want the entire population roster keyed in and then each of the 12-15 year-old modules keyed in? This could be a one- or two- level application. • Do you want each 12-15 year-old module keyed in immediately after the keying of the person’s data in the population roster? This would require using two data files and an external form file.
Network Data Entry • In the past, each keyer entered data to a file on a computer and a supervisor had to copy the data from each machine to a centralized computer and concatenate the data • Now, with LANs very easy to set up, it may be easier to have the keyers enter data directly to a single machine • It is not possible for multiple keyers to enter data to one data file, but they can enter data to different files on a network drive • The supervisor must still concatenate the data to create the master data file, but backing up and concatenating data is much easier if using a network drive • Similarly, placing the data entry application on a network drive eliminates the prior need to redistribute the application after any modifications were made
Testing the Application • As a CSPro programmer, you should test the application thoroughly • Ensure that every skip pattern works successfully, and that all consistency checks are valid • Make sure any calls to a lookup file complete without error • It is extremely important, however, to have someone without any CSPro experience test the application • A novice can often discover problems more quickly, and can uncover different problems, than an expert user • Ideally a data entry application will be created and tested before the census or survey goes to the field • Timing several keyers entering pilot or test data will help determine how many keyers must be hired for the keying operation
During Data Entry • Create a logical system of data file names • You might create an operational control system so that the keyers do not have to manually name data files • Backup data files on a regular basis, ideally daily • Monitor the keying rates, both speed and accuracy, of the keyers; incentives might improve performance • If verifying data, consider whether to verify data concurrently to the first keying, or to key the data in two phases • Look at frequencies in the data set to make sure that the keyed data, starting from day one, looks good
Keyer Efficiency • CSPro keeps a log that can be used to calculate keying rates and accuracy for each keyer (extension: .log) • The log can identify slow keyers, unusually fast keyers, inaccurate keyers, or keyers who take long breaks • The log can be read with CSPro or with Excel (by renaming it to have extension .csv) • One caveat is that keyers not fully trained in CSPro may have lower keyer rates (e.g., they do not pause the application while on lunch break)
Verifying Data • Verifying census or survey data (double keying) adds significant expense to a data entry operation, but it may be necessary to ensure good quality keyed data, particularly for a survey • Two forms of verification: • Independent verification • Dependent verification
Independent Verification • Two keyers key a questionnaire to separate data files • The operational control system should ensure that the keying supervisor can easily identify in what files the two keyed questionnaires are located • The supervisor runs the Compare Data tool, which produces a report identifying differences in the keyed files • The supervisor then chooses one file as the source file for the final data file and modifies that file, resolving all errors by reexamining the paper questionnaire
Dependent Verification • A keyer keys a questionnaire to a data file • A second keyer takes the first keyer’s data file and keys the questionnaires in the same order as the first keyer keyed them • If the second keyer enters a value that differs from the value entered by the first keyer, the second keyer is prompted to rekey the value • The second keyer, not a supervisor, is the arbitrator of the correct value of a field
Verifying Data (continued) • Independent verification advantages: • Keyerswill not be slowed down by mistakes made by the other keyer • May be more accurate because three people look at hard-to-read fields • The second keying does not need to be in the same order as the first keying • Dependent verification advantages: • Can be faster than independent verification because of the elimination of the supervisory position and the need to look at the paper questionnaires a third time • Eliminates the need for two copies of all data files • Allows for the verification of only high priority fields