150 likes | 306 Views
Data Processing and Tabulation Part II. Data Files. The clean data file created through this editing and variable creation process should be labeled clearly and stored.
E N D
Data Files The clean data file created through this editing and variable creation process should be labeled clearly and stored. • The file name should have a specific format that will indicate to all that it is the clean file, that it contains confidential data, and what time period it is from • If the survey is conducted repeatedly, each file should have the same name but with a different date. • These files should always be stored in the same place so everyone who needs it knows where to find it and there is no confusion about versions.
Data Files A public use data file should be created by removing confidential data from the edited file. Any perturbations, such as to age, should also be made. • The file name should have a specific format that will indicate that it is the public use file. • The file, along with the necessary documentation to be able to use it, should also be made readily available to the public. www-03.ibm.com
Documentation Documenting how you clean and edit the data is very important. • Documentation on cleaning is a great reference for future years and for other surveys so you can be sure nothing gets missed. It also is handy to refer back to if someone asks if a particular procedure was carried out. • Documenting edits, such as how new variables were created, is extremely important. Every data user will need this information. It is also vital for analysis.
Good data are the key output The ultimate goal of a survey is to produce high quality, useful data. • Documentation of the concepts and data file are necessary for both internal and external users. • Thinking through desired tabulations avoids leaving out needed questions. • Presenting the data in a clear, consistent, well though out manner makes it more useable. A tabulation plan furthers this goal.
The Tabulation Plan The tabulation plan has several parts and is updated throughout the survey development process. • Table shells • Version control, file storage, and file naming conventions • Code or pseudocode, and text describing each of the concepts that will be made from the data. • This is particularly important for complex concepts and those created from multiple variables. • It is critical that this is correct as it will be the basis of the tabulation and estimation of data from the survey.
Table Shells Table shells are best drafted when designing the questionnaire. • Table shells are tables without data. • They can be updated after the questionnaire is finalized. • Once data are collected and tabulated, they may require additional revision. • If resources permit, two sets of tables may be desirable.
Table Shells Ordering tables • Start with tables showing basic concepts and have later tables show more concepts or more detail • Group tables on the same or similar topics Titles, rows, and column headers should be clear, informative, and consistent • The title should list what is in the table and should include when the data are from • “Employment status by educational attainment, marital status, sex, and age” is a much more useful title than “Employment status by selected characteristics.” • The same term should always be used to mean the same thing
Table Shells Use indentation to indicate groups that are part of other groups in a row stub • When done consistently, data users learn what it means and can understand when the groups are not obvious • Three spaces generally works well as an indentation • Multiple levels of indentation may be used • When the same set of groups are used in multiple tables, the indentations should be the same in each of them • Indentation is universal, while other methods might not always work • Indentations show well both in print and on computers • Bold, italics, and underlining might create odd symbols on the users computer rather than the desired effect • Colors have several issues. If the table is printed in black and white or photocopied, the distinctions are lost.
Table Shells In column headers, use spanners to indicate which components are part of a larger group • “Total” columns may be included in, or omitted from, the spanned sections • This is an excellent way of showing potentially complex ideas clearly • At first, these spanners can look confusing to people who aren’t used to them, but once they are explained, people seem to like being able to see how concepts nest in one another.
Table Shells Showing several iterations of a table is often desirable. • One may want to show iterations based on sex, labor force status, or other characteristics • Having the same table repeat over several pages with each page showing data for another group makes comparisons easy • When a table spans several pages, it is good practice to have “continued” as part of the table title on the pages after the first one
Version control, file naming, and file storage conventions File maintenance is critical, especially since multiple people may be working on the files. • Standardized names ensure that people use the correct file and do not release confidential data or produce tables from unedited files. • Strong version control minimizes the risk of a person making updates to an old file and those updates being lost. • Storing files in a single location and having users retrieve them form there enables them to get the most up to date version.
Code or Pseudocode Code or pseudocode detailing how to make each of the concepts is a very high priority. • It is essential for creating the concepts from the microdata and for making tables. • Correctly defined concepts are the base of all that is produced. Errors here have huge repercussions. Text descriptions help data users better understand the concepts. These are essential pieces of documentation but they are also very helpful in the development of the survey When internationally accepted concepts are used, the definitions are already available for adoption.