330 likes | 531 Views
17a. Accessing Data: Manipulating Variables in SPSS ®. Prerequisites. Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either 4. Parent and Youth Surveys or
E N D
Prerequisites • Recommended modules to complete before viewing this module • 1. Introduction to the NLTS2 Training Modules • 2. NLTS2 Study Overview • 3. NLTS2 Study Design and Sampling • NLTS2 Data Sources, either • 4. Parent and Youth Surveys or • 5. School Surveys, Student Assessments, and Transcripts • NLTS2 Documentation • 10. Overview • 11. Data Dictionaries • 12. Quick References
Prerequisites • Recommended modules to complete before viewing this module (cont’d) • 13. Analysis Example: Descriptive/Comparative Using Longitudinal Data • Accessing Data • 14a. Files in SPSS • 15a. Frequencies in SPSS
Overview • Purpose • Modifying existing variables • Creating new variables • Summary • Closing • Important information
NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.
Purpose • Learn to • Modify an existing variable • Create a new variable • Join/combine data from different sources
Modifying existing variables • How to modify a variable. • It is necessary to create a new variable in SPSS to • Collapse categories • Break a continuous variable into categories • Recode a variable. • Note about created variables in the NLTS2 database • Our analyses were done in SAS, and this recoding step is usually not necessary in SAS because of the external formats feature. • Collapsed or recategorized variables do not necessarily exist in SAS or SPSS files even if these items appear in published tables. • There are many created variables in the NLTS2 database, but most of them are not simply collapsed versions of an existing variable. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to recode into collapsed categories RECODE np1B2a (MISSING=SYSMIS) (Lowest thru 1=1) (2 thru 5=2) (6 thru 10=3) (11 thru Highest=4) INTO np1B2a_Cat . These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to assign a variable label to the new variable *assign variable label to new categorical variable. VARIABLE LABELS np1B2a_Cat'(np1B2a_cat) Age of youth when diagnosed categorized'. EXECUTE. • Syntax to assign value labels * assign value labels to new categorical variable. VALUE LABELS np1B2a_Cat 1 "(1) 1 or younger" 2 "(2) 2 to 5 years of age" 3 "(3) 6 to 10 years of age" 4 "(4) 11 or older". These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Menu • Transform: Recode into Different Variables • Select the variable to be recoded from the list and click the right-facing arrow. • Give the new variable a name in the box under “Output Variable.” • Assign a label to the new variable in the “Label” box under “Output Variable.” • Click “Change.” • Click on the box marked “Old and New Values,” and a new box pops up. • In the new box, under “Old Values” click the radio button “System or User-missing,” click “System Missing” under “New Values,” and click “Add” next to “Old -- >New.” These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Menu (cont’d) • For each old to new value(s) • Under “Old Values,” click a radio button by an actual value or range of values box. • Designate what the old values are, either actual or range of values, in the appropriate box. • Assign a new code under “New Values” and click “Add.” • When finished with values, click “Continue” to return to the first box. • In the original box, click “OK” or “Paste” to generate code. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Look at results. • New variable should appear at bottom of “Variable View.” • Specify formats so values are meaningful. • In variable view, click on the cell in the “Values” column to bring up a new box. • Enter a value in the “Value” box, a label for that value in the “Label” box, and click “Add.” • Do this for every value. • Look at frequency distribution. • Useful to look at a crosstab of the original by the new variable. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • Modifying a variable • Open Wave 3 parent/youth interview file. • Collapse np3NbrProbs into new variable. • 0-1 • 2 • 3 • 4-6 • Remember to • Label variable • Add value formats • Account for missing values • Paste your code. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • How to create a new variable. • The values in the new variable can be the results of calculations, assignments, or logic. • A new variable can be created from an existing variable or from multiple variables, including variables from other sources and/or waves. • Variables from other sources/waves must be added to the active data file before the new variable is created. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Be aware of any coding differences between the variables when combining values. • Decide what to do with missing values. • Example: Create a variable using parent interview data from Waves 1, 2, and 3. • Has a student been suspended and/or expelled in any wave? These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Syntax IF (np1D7h=0 and np2D5d=0 and np3D5d=0 and np4D5d=0) np4D5d_ever=0. IF (np1D7h=1 or np2D5d=1 or np3D5d=1 or np4D5d=1) np4D5d_ever = 1. IF (np1D7h=1 and np2D5d=1 and np3D5d=1 and np4D5d=1) np4D5d_ever = 2. IF (MISSING(np1D7h) or MISSING(np2D5d) or MISSING(np3D5d) or MISSING(np4D5d)) np4D5d_ever = -999 . EXECUTE . • This code will result in a variable that • Requires a value for every wave • Is 0 if never suspended/expelled • Is 1 if suspended/expelled in any wave • Is 2 if suspend/expelled in all three waves. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Menu • Transform: Compute • Enter a variable name under “Target Variable.” • Click “Type & Label” and assign a label. • If applicable, find and select the source variable(s) and click the right-facing arrow to move the variable name into the “Numeric Expression” box. • Enter functions/operations from the keypad boxes or select from the list of functions. • For logical conditions, click “If…” and build the condition in the pop-up box. • Click “OK” or “Paste.” • For multiple conditions (i.e., if-then-else), repeat all steps. • Specify conditions in order of overriding conditions. • If true, each subsequent condition will override the previous condition. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable • Open the Wave 4 parent/youth interview file. • Bring in np1F7 from Wave 1, np2P8_J4 from Wave 2, and np3P8_J4 from Wave 3 interview files. • Create a new variable np4P8_J4_ever (ever done volunteer or community service). • Initialize value to “0” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “0.” • Reassign to “1” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “1.” These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable (cont’d) • Assign variable label and value labels. • Run a frequency of np4P8_J4_ever. • Run a crosstabulation of np4P8_J4_ever by np4P8_J4. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Code for example IF (np1F7=0 or np2P8_J4 = 0 or np3P8_J4=0 or np4P8_J4=0) np4P8_J4_ever = 0 . IF (np1F7=1 or np2P8_J4=1 or np3P8_J4=1 or np4P8_J4=1) np4P8_J4_ever = 1 . These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary • Be aware of differences in coding between similar variables when building composite variables. • Missing values must be considered. • Know how missing values are being coded, particularly when using more than one variable to create another. • Joined data are more likely to have missing values. • Weights • Generally, the analysis weight should be the weight from the smallest sample when combining data. • When filling in values for a variable in an active file with values from another, it is OK to use the weight in the active file. • Strongly recommended: Paste your code when creating variables. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary Know the values, mind the missing, and watch your weights! These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Closing • Topics discussed in this module • Modifying existing variables • Creating new variables • Summary • Next module • 18a. Complex Samples Procedures in SPSS
Important information • NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/ • Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/ • General information about restricted data licenses can be found on the NCES websitehttp://nces.ed.gov/statprog/instruct.asp • E-mail address: nlts2@sri.com