400 likes | 428 Views
Learn how to identify coding errors, implement logical tests, and select cases for cleaner data analysis in drug abuse research using SPSS skills.
E N D
GAP Toolkit 5Training in basic drug abuse data management and analysis Data cleaning Training session 12
Objectives • To establish methods of uncovering coding errors • To discuss techniques for implementing logical tests • To present methods of selecting cases • To reinforce the SPSS skills presented to date
Boolean operators: AND • The AND operator is a logical operator in Boolean algebra • Imagine two statements: X and Y • For the operation (X AND Y) to be true X has to be true and Y has to be true • The rules for Boolean operators are commonly displayed in Truth Tables
Boolean operators: OR • The OR operator is a logical operator in Boolean algebra • Imagine two statements: X and Y • For the operation (X OR Y) to be true either X is true or Y is true or both X and Y are true
Data cleaning • Check the data for errors • Clean the data before any data analysis
Types of error • There are two broad areas of error: • Coding errors • Logical errors
Coding error • Data entry errors • Out-of-range values
Detecting out-of-range values • For categorical variables, having declared valid values, frequency counts will highlight any peculiar entries • For continuous variables, descriptive statistics, in particular the range and a histogram, will highlight any peculiar values
Examples • Age: generate descriptive statistics • Treatment type: generate a frequency distribution
Resolving errors • The questionnaires should be checked • If possible, return to the interviewer or interviewee • If still unresolved, consider setting the value as missing • Note the importance of ID numbers for linking the computer to the questionnaire
Selecting cases • The ability to select a set of cases according to a criterion is essential in data cleaning • Generating statistics for subsets of the data is also a useful analytical tool
Example: Age • Descriptive statistics of Age indicate that there is a case with a value of 1 and a case with the value 77 • It is advisable to check the extreme values Descriptive Statistics
Example: Age • It would be reasonable to check for values 10 and under and 70 and over • The task is to select those cases and display the results • Data/Select Cases generates the following dialogue box
Data/Select Cases • SPSS creates a new variable in the data set called filter_$ which = 1 when AGE<=10 OR AGE >= 70 • All subsequent analysis will be on the reduced data set until Data/Select Cases/All Cases is chosen • The filtered cases are identified by a slash through the case number
Generating a report • Analyse/Reports/Case Summaries • Select the variables to be included in the summary
Case summariesa a. Limited to first 100 cases.
Note: All Cases • Don’t forget that, once certain cases have been selected, all subsequent analysis is on the selected cases only • Once you have finished working with the subset, restore the file to All Cases before doing any further analysis • Data/Select Cases… • Select the All Cases radio button • OK
Locating a case • From the Data Editor: • Data/Go To Case OR • Select a variable, then Edit/Find
Logical errors • Detecting logical errors involves comparing answers to ensure that they are consistent • The type of logical checks appropriate to identify particular errors will depend on the questions in the questionnaire
Detecting logical errors • Cross-tabulations between categorical variables can be used to highlight errors • Check criteria using conditional statements and the Compute facility • Some software, such as SPSS Databuilder, allows tests for logical and coding errors to be built into a data entry form
Example: Cross-tabulation • Cross-tabulations provide a simple method of investigating the joint distribution of two variables • The following slide is a cross-tabulation of Drug1 against Mode1 to check that appropriate modes of ingestion have been reported
Most Frequently Used Drug (Cross-tabulation) Most frequently used drug
Example: conditional statements • Main.sav contains information on the three most frequently used drugs: Drug1, Drug2 and Drug3 • In a single case, no drug should appear in more than one of the three variables • To check this, generate a test variable on the basis of a conditional statement; the test variable should take the value 0 if all three drug variables are different and the value 1 if there is any duplication
Compute: Test = 0 • Transform/Compute • Enter the name of the new variable: TEST • Click the Type and Label button and declare the variable as numeric with the label: TEST VARIABLE FOR DRUG DUPLICATION • Set TEST = 0
Compute: TEST = 1 • If any of the drug options are the same, TEST should equal 1 EXCEPT when Drug2 = Drug3 = 77 (not applicable) • The condition is if • Drug1 = Drug2 OR • Drug1 = Drug3 OR • (Drug2 = Drug3 AND Drug2 77) • THEN Test = 1
Click If… button to define the conditional statement.
Case summariesa a. Limited to first 100 cases.
Exercise • Check for consistency between the drug reported and the method of ingestion for the second and third drugs of use • What additional logical tests could be completed on the data in main.sav?
Summary • Data entry errors • Out-of-range errors • Logical errors • Conditional statements • Selecting cases • Reports