810 likes | 831 Views
Investigating the correlation between pet fish ownership and use of high cholesterol medication in women from the WHI Observational Study, using SAS software for data analysis.
E N D
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09
The Question • In the WHI Observational Study, are women with pet fish less likely to have ever taken pills for high cholesterol at baseline?
What We Want to Do • Find the data • Download the appropriate zip files • Load them into SAS • Merge our sets together • Do a basic Chi-Square test
A Few Notes: • The data files used for this example are subsets of the full form data. • This was done to reduce download time and ease the replication of this analysis • All processes we will go through are identical to what you would do for a normal analysis
Finding the Data • The first step is to figure out what we need to answer our question. We will need: • Pet data • Cholesterol data • Demographic data (to help us select only women in the observational study)
First we want to go to the study operations web site: www.whiops.org
The Data Screen • Data available for both WHI and WHIMS • Images of all forms • Options to look for dictionaries by category • Link to the Data Distribution Agreement - Anyone who uses the data should fill it out - PI’s are responsible for data at the clinics
Let’s Look for Our Data • First, let’s hunt for the fish data. Since we don’t know what form it’s on, let’s click on the ‘Data dictionaries by analysis category’ link.
Where Would Fish Be? • Let’s take a look in the Psychosocial/Behavioral subcategory
Since there are 216 variables, it will be easier to right click on the document and search for “fish”
The Fish Variable is on Form 37 - We should also note that it is a sub question of ‘Do you have a pet” and is a “Mark all that apply” question!
Now Let’s Find High Cholesterol • Going back to the ‘Data Dictionaries by Category’ screen, it will be in the Medical History section
Medical History is Broken Up into Subcategories • It should be under Cardiovascular
Now We Just Need an Indicator to tell us which Participants are in the OS • All trial flags and indicators are in the Demographics file Now We’re Ready to Download the Data!
Back to the Data Screen • Click on ‘Datasets’
An Aside: The Datasets Page • All data is arranged by form • In addition to the zip files with the data, the .pdf files of the data dictionaries can also be downloaded separately • For more detailed info on what’s in a zip file, please see the Appendix at the end of the walkthrough
Downloading the Data • For the purposes of this demo, smaller sets have been created that anyone with a WHI password can download • Only PI’s can normally download the actual data files • Scroll down to the bottom of the Datasets page to find these files
Downloading the Data • When you click on the zip file link, you get a pop up box • Save the file in the directory of your choice
Downloading Data • For my example, I’ve saved all of the data in a directory I created called “DataTraining”
Extracting the Data from the Zip Files • Double click on the first zip file, the demographics file, you should be able to see the contents Click on the ‘Extract’ button
Extracting the Data from the Zip Files • Extract the files to the same directory as your zip files
Extracting the Data from the Zip Files • Repeat with the other two zip files. • The resulting directory should look like this:
Analyzing the Data • We now have everything we need to look at the data • For the purposes of this example, I’m going to use SAS • Other software such as S-Plus, Stata, R, SPSS, and others can also be used • Even if using another program, the SAS Load code provided can be used to determine the order of variables in the dataset as well as formats
Loading in the Data • From the Default SAS screen, go up to the File menu and select ‘Open Program’
Loading in the Data • Select all three of the files and click ‘Open’
Loading in the Data • Let’s start with the demographics data • One change needs to be made to each file to let SAS know where the data is located • Find where the actual file is being read in, this is the line in the file that begins with INFILE • We can also change the name of the file we are creating in the line above the INFILE statement
Loading in the Data • In the example, we’ve put the data in ‘S:\DataTraining’ • I’ve also renamed the file ‘demographics’ instead of the default, which was ‘dem_ctos_train’
Loading in the Data • Now that the location of the datafile has been updated, we can run the SAS Code • Go to the ‘running man’ icon, which is the button to submit code
Loading in the Data • If you are concerned or unsure whether it worked or not, you can look at the SAS log. The tab is at the bottom of the screen. • Any errors would show up as RED in the log
Loading in the Data • Now we want to repeat the process for the other two files. • First for Form 30
Loading in the Data • Then for Form 37
Looking at What We Have • Let’s make a new SAS program file to look at the data • Go to the File Menu and select ‘New Program’
Looking at What We Have • We can also now close the three files used to load the data into SAS • You should now have your new program, the log, and the output tabs
Looking at What We Have • To know the names of the files we’ve loaded we can use some PROC DATASETS code.
Looking at What We Have • Once the code is typed in, click the submit button again and then go to the LOG tab
Looking at What We Have • In the log we see the three files we’ve loaded: - DEMOGRAPHICS (The Demographics File) - FORM30 (The Cholesterol Data) - FORM37 (The Pet Fish Data) Now we need to do some data manipulation to pull this all together
The Demographics File • Let’s look at the demographics file (DEMOGRAPHICS) first • PROC CONTENTS can be used to determine what variables are in a file • Highlight the code and then hit the submit button
The Demographics File • On the output screen we see what variables are available • We only want to keep OS participants, so we will need the OSFLAG variable, which has a value of 1 for participants in the observational study • We also want to keep the ID variable for merging the files later
The Demographics File • Let’s Look at the Code to do this: • We are manipulating the ‘demographics’ file and creating a new file ‘demographics_2’ with our changes • We only want to keep the ID and OSFLAG variables
The Form 30 File • This is our medical history data • Looking at the data dictionary, we see that this file is a baseline file with one row per participant
The Form 30 File • Let’s look at the contents of the Form 30 file