280 likes | 394 Views
17b. Accessing Data: Manipulating Variables in SAS ®. Prerequisites. Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either 4. Parent and Youth Surveys or
E N D
Prerequisites • Recommended modules to complete before viewing this module • 1. Introduction to the NLTS2 Training Modules • 2. NLTS2 Study Overview • 3. NLTS2 Study Design and Sampling • NLTS2 Data Sources, either • 4. Parent and Youth Surveys or • 5. School Surveys, Student Assessments, and Transcripts • NLTS2 Documentation • 10. Overview • 11. Data Dictionaries • 12. Quick References
Prerequisites • Recommended modules to complete before viewing this module (cont’d) • 13. Analysis Example: Descriptive/Comparative Using Longitudinal Data • Accessing Data • 14b. Files in SAS • 15b. Frequencies in SAS
Overview • Purpose • Modifying existing variables • Creating new variables • Summary • Closing • Important information
NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.
Purpose • Learn to • Modify an existing variable • Create a new variable • Join/combine data from different sources
Modifying existing variables • How to modify a variable. • To collapse categories, break a continuous variable into categories, or recode a variable, it is not always necessary to create a new variable in SAS. • User-assigned formats control how output prints but does not change the variable. • Syntax for categorizing an existing variable with a format PROC FORMAT ; VALUE b2catfmt low-1 = "(<=1) 1 or younger" 2-5 = "(2-5) 2 to 5 years of age" 6-10 = "(6-10) 6 to 10 years of age" 11-high = "(>=11) 11 or older" ; PROC FREQ data = collapse ; TABLES np1B2a ; FORMAT np1B2a b2catfmt. ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to modify an existing variable • Create a new variable rather than permanently changing the exiting variable • Create a new format so values are meaningful PROC FORMAT ; VALUE b2catfmt 1 = "(1) 1 or younger" 2 = "(2) 2 to 5 years of age" 3 = "(3) 6 to 10 years of age" 4 = "(4) 11 or older" ; • Recode the variable in a data step • This would result in a temporary change. Why? What would make it a permanent change? DATA collapse ; SET sasdb.n2w1parent ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to recode an existing variable into a new variable with value and variable labels. /* create age of youth when diagnosed – with age range categories*/ if missing(np1B2a) then np1B2a_Cat = np1B2a ; else if np1B2a <= 1 then np1B2a_Cat = 1 ; else if 2<=np1B2a<=5 then np1B2a_Cat = 2 ; else if 6<=np1B2a<=10 then np1B2a_Cat = 3 ; else if np1B2a > 10 then np1B2a_Cat = 4 ; FORMAT np1B2a_Cat b2catfmt. ; LABEL np1B2a_Cat = '(np1B2a_cat) Age of youth when diagnosed - categorized into ranges' ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Look at results • Run a frequency of the new variable • Useful to look at a crosstab of the original variable by the new variable to check how values were coded • Look at frequency distributions and crosstab of new vs. old variables • The “LIST” option on TABLES statement will print the crosstab table more compactly. • A FORMAT statement without a format specified will strip existing formats. TABLES np1B2a_Cat * np1B2a/MISSPRINT LIST ; FORMAT np1B2a ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • Modifying a variable • Use Wave 3 parent/youth interview file • Collapse np3NbrProbs into a new variable • 0-1 • 2 • 3 • 4-6 • Remember to • Label the variable. • Add value formats. • Account for missing values. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • PROC FREQ with a user-defined format (no change made to np3NbrProbs) These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • PROC FREQ with new variable np3NbrProbs_Cat created from np3NbrProbs These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • Created np3NbrProbs_Cat compared with original np3NbrProbs • Stripped existing formats from np3NbrProbs with format statement • FORMAT np3NbrProbs; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • How to create a new variable. • The values in the new variable can be the results of calculations, assignments, or logic. • A new variable can be created from an existing variable or from multiple variables, including variables from other sources and/or waves. • Variables from other sources/waves must be added to the active data file before creating the new variable. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Be aware of any coding differences between the variables when combining values. • Decide what to do with missing values. • Example: Create a variable using parent interview data from Waves 1, 2, and 3. • Has student been suspended and/or expelled in any wave? These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables Create a format for the new variable and join data needed PROC FORMAT ; VALUE fmta 0 = "(0) Never suspended/expelled" 1 = "(1) Suspended or expelled in any wave" 2 = "(2) Suspended or expelled every wave" ; DATAcollapse ; MERGE sasdb.n2w1parent (keep=ID np1d7h) sasdb.n2w2paryouth (keep=ID np2d5d) sasdb.n2w3paryouth (keep=ID np3d5d) sasdb.n2w4paryouth(keep=ID np4d5d) ; BY ID ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Syntax If np1D7h>=0 and np2D5d>=0 and np3D5d>=0 and np4D5d>=0then do ; if np1D7h=1 and np2D5d=1 and np3D5d=1 and np4D5d=1 then np4D5d_ever = 2 ; else if np1D7h=1 or np2D5d=1 or np3D5d=1 or np4D5d=1 then np4D5d_ever = 1 ; else np4D5d_ever = 0 ; end ; • Code will result in a variable that • Requires a value for every wave • Is 0 if never suspended/expelled • Is 1 if suspended/expelled in any wave • Is 2 if suspend/expelled in all three waves. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable • Use the Wave 4 parent/youth interview file. • Bring in np1F7 from Wave 1, np2P8_J4 from Wave 2, and np3P8_J4 from Wave 3 interview files. • Create a new variable np4P8_J4_ever (ever done volunteer or community service). • Initialize value to “0” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “0.” • Reassign to “1” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “1.” These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable (cont’d) • Assign a variable label and value labels. • Run a frequency of np4P8_J4_ever. • Run a crosstabulation of np4P8_J4_ever by np4P8_J4. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary • Be aware of differences in coding between similar variables when building composite variables. • Missing values must be considered. • Know how missing values are being coded, particularly when using more than one variable to create another. • Joined data are more likely to have missing values. • Weights • Generally, the analysis weight would be the weight from the smallest sample when combining data. • When filling in values for a variable in an active file with values from another, it is OK to use the weight in the active file. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary Know the values, mind the missing, and watch your weights! These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Closing • Topics discussed in this module • Modifying existing variables • Creating new variables • Summary • Next module: • 18b. PROC SURVEY Procedures in SAS
Important information • NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/ • Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/ • General information about restricted data licenses can be found on the NCES websitehttp://nces.ed.gov/statprog/instruct.asp • E-mail address: nlts2@sri.com