Pet Fish & High Cholesterol: An Analysis

Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09

The Question • In the WHI Observational Study, are women with pet fish less likely to have ever taken pills for high cholesterol at baseline?

What We Want to Do • Find the data • Download the appropriate zip files • Load them into SAS • Merge our sets together • Do a basic Chi-Square test

A Few Notes: • The data files used for this example are subsets of the full form data. • This was done to reduce download time and ease the replication of this analysis • All processes we will go through are identical to what you would do for a normal analysis

Finding the Data • The first step is to figure out what we need to answer our question. We will need: • Pet data • Cholesterol data • Demographic data (to help us select only women in the observational study)

First we want to go to the study operations web site: www.whiops.org

Select the Study Operations Link

Click on the “Data” Tab

The Data Screen

The Data Screen • Data available for both WHI and WHIMS • Images of all forms • Options to look for dictionaries by category • Link to the Data Distribution Agreement - Anyone who uses the data should fill it out - PI’s are responsible for data at the clinics

Let’s Look for Our Data • First, let’s hunt for the fish data. Since we don’t know what form it’s on, let’s click on the ‘Data dictionaries by analysis category’ link.

Where Would Fish Be? • Let’s take a look in the Psychosocial/Behavioral subcategory

Since there are 216 variables, it will be easier to right click on the document and search for “fish”

Searching for Fish

Found It!

The Fish Variable is on Form 37 - We should also note that it is a sub question of ‘Do you have a pet” and is a “Mark all that apply” question!

Now Let’s Find High Cholesterol • Going back to the ‘Data Dictionaries by Category’ screen, it will be in the Medical History section

Medical History is Broken Up into Subcategories • It should be under Cardiovascular

It looks like it is on Form 30

Now We Just Need an Indicator to tell us which Participants are in the OS • All trial flags and indicators are in the Demographics file Now We’re Ready to Download the Data!

Back to the Data Screen • Click on ‘Datasets’

The Datasets Screen

An Aside: The Datasets Page • All data is arranged by form • In addition to the zip files with the data, the .pdf files of the data dictionaries can also be downloaded separately • For more detailed info on what’s in a zip file, please see the Appendix at the end of the walkthrough

Downloading the Data • For the purposes of this demo, smaller sets have been created that anyone with a WHI password can download • Only PI’s can normally download the actual data files • Scroll down to the bottom of the Datasets page to find these files

WHI Example Files for Downloading

Downloading the Data • When you click on the zip file link, you get a pop up box • Save the file in the directory of your choice

Downloading Data • For my example, I’ve saved all of the data in a directory I created called “DataTraining”

Extracting the Data from the Zip Files • Double click on the first zip file, the demographics file, you should be able to see the contents Click on the ‘Extract’ button

Extracting the Data from the Zip Files • Extract the files to the same directory as your zip files

Extracting the Data from the Zip Files • Repeat with the other two zip files. • The resulting directory should look like this:

Analyzing the Data • We now have everything we need to look at the data • For the purposes of this example, I’m going to use SAS • Other software such as S-Plus, Stata, R, SPSS, and others can also be used • Even if using another program, the SAS Load code provided can be used to determine the order of variables in the dataset as well as formats

Loading in the Data • From the Default SAS screen, go up to the File menu and select ‘Open Program’

Loading in the Data • Select all three of the files and click ‘Open’

Loading in the Data • Let’s start with the demographics data • One change needs to be made to each file to let SAS know where the data is located • Find where the actual file is being read in, this is the line in the file that begins with INFILE • We can also change the name of the file we are creating in the line above the INFILE statement

Loading in the Data • In the example, we’ve put the data in ‘S:\DataTraining’ • I’ve also renamed the file ‘demographics’ instead of the default, which was ‘dem_ctos_train’

Loading in the Data • Now that the location of the datafile has been updated, we can run the SAS Code • Go to the ‘running man’ icon, which is the button to submit code

Loading in the Data • If you are concerned or unsure whether it worked or not, you can look at the SAS log. The tab is at the bottom of the screen. • Any errors would show up as RED in the log

SAS Log for Loading in Demographics

Loading in the Data • Now we want to repeat the process for the other two files. • First for Form 30

Loading in the Data • Then for Form 37

Looking at What We Have • Let’s make a new SAS program file to look at the data • Go to the File Menu and select ‘New Program’

Looking at What We Have • We can also now close the three files used to load the data into SAS • You should now have your new program, the log, and the output tabs

Looking at What We Have • To know the names of the files we’ve loaded we can use some PROC DATASETS code.

Looking at What We Have • Once the code is typed in, click the submit button again and then go to the LOG tab

Looking at What We Have • In the log we see the three files we’ve loaded: - DEMOGRAPHICS (The Demographics File) - FORM30 (The Cholesterol Data) - FORM37 (The Pet Fish Data) Now we need to do some data manipulation to pull this all together

The Demographics File • Let’s look at the demographics file (DEMOGRAPHICS) first • PROC CONTENTS can be used to determine what variables are in a file • Highlight the code and then hit the submit button

The Demographics File • On the output screen we see what variables are available • We only want to keep OS participants, so we will need the OSFLAG variable, which has a value of 1 for participants in the observational study • We also want to keep the ID variable for merging the files later

The Demographics File • Let’s Look at the Code to do this: • We are manipulating the ‘demographics’ file and creating a new file ‘demographics_2’ with our changes • We only want to keep the ID and OSFLAG variables

The Form 30 File • This is our medical history data • Looking at the data dictionary, we see that this file is a baseline file with one row per participant

The Form 30 File • Let’s look at the contents of the Form 30 file

Pet Fish & High Cholesterol: An Analysis