1 / 41

Using the 2008 OFHS Public Use File A Self Guided Tutorial *SAS Version*

Using the 2008 OFHS Public Use File A Self Guided Tutorial *SAS Version*. Introduction. This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF).

roz
Download Presentation

Using the 2008 OFHS Public Use File A Self Guided Tutorial *SAS Version*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the 2008 OFHS Public Use File A Self Guided Tutorial *SAS Version*

  2. Introduction • This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF). • The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed. • The dataset is a record of the responses to the survey questions at the respondent level. • The dataset is in a format that requires the use of SAS, a statistical analysis software from SAS Institute. • The dataset is also available for STATA and SPSS. There is a separate tutorials for STATA users.

  3. SAS Users • Prerequisites • User has SAS version 9.1 or later. • User has experience writing SAS programs and running them in the SAS Display Manager user interface, or in SAS Enterprise Guide. • User has an understanding of basic statistics, including analysis of univariate data using nominal and ordinal level variables. • User is comfortable with statistical terms such as proportions, standard error, confidence level, and confidence interval.

  4. OFHS Background • The 2008 OFHS is the largest State sponsored health survey in the U.S. • Previous surveys were completed in 1998 and 2004. • The survey had a sample size of 50,993. • The survey was stratified to have enough respondents to do some analysis for each county in the state.

  5. Documents that you may download before you get started. • OFHS Questionnaire • OFHS Codebook • OFHS Methods Report These documents are available on the OFHS web site. http://grc.osu.edu/ofhs Look on the Reports page.

  6. What you need to know about the survey. • Survey Design • Survey Questions • Imputation of Missing Values • Weighting of Responses • Constructed Variables

  7. Survey Design • The survey is a stratified random sample of Ohio’s non-institutional population. • Conducted through telephone interviews. • Land Lines (49,000 respondents) • Cell Phone (2,000 respondents) • Random Digit Dialing (land lines) within exchange numbers associated with each county. • Exchanges are the first 3 digits of a seven digit phone number. • The last four digits within each exchange are randomly selected.

  8. Survey Design • Cell Phones • Exchanges are at state level. • Over Samples • African Americans - Some Exchanges in 6 largest urban counties have higher proportion of African Americans in the population. The higher proportion exchanges were sampled at a higher rate. • Asian and Hispanics - Supplementation of survey with lists of persons with hispanic or asian surnames. • Household clusters • Each household/family forms a cluster within the sample. • One adult and one child are randomly selected within the family. • Each response includes information on the adult, and the child (if there are any children). • The adult who is most knowledgeable about the child’s health responds for the child.

  9. Survey Design • The population of persons within each of the strata (State, County, telephone exchange, household, etc.) is already known or is collected as a part of the survey. • A weight is established for each child and adult which reflects the inverse of the probability of being selected for the survey. • Indicators of the strata and the weights are used in the SAS programs. We will come back to this later on.

  10. Survey Questions • In the survey questionnaire there are different kinds of questions. They include: • Qs that help to establish the weights for the survey. • How many children are in the family? • How many phone numbers are in the home?

  11. Survey Questions • Qs that identify the demographic and socioeconomic characteristics of the individuals and the family. • Age, gender, race, ethnicity. • Family income, employment, industry. • Education

  12. Survey Questions • Qs that identify the insurance status of the adult and child respondents. • Source of Coverage (Job based, Medicare, Medicaid, etc.) • If no insurance, the length of time without insurance, reason for being uninsured. • If insured, length of time covered by current plan. • Types of Coverage (dental, prescriptions, vision mental health)

  13. Survey Questions • Health Status of Adult and Child • General health status • Chronic health conditions • Special Health Care needs • Functional disability • Height and weight

  14. Survey Questions • Health Care Access, Utilization, Satisfaction and Unmet needs. • Usual source of care • Care coordination • Specialists • Emergency room use • Hospitalizations • Types of unmet needs.

  15. Survey Questions • Questions are at multiple levels. • Anchor Questions are questions that are asked of everyone. • Qualifying Questions are questions that help to narrow down who should be responding to an in-depth question. • In-depth questions probe the dimensions of the respondent’s experience with a particular phenomenon.

  16. D43. //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he// had diabetes or sugar diabetes? 01 YES 02 (Skip to D45) NO 03 [VOLUNTEERED:] BORDERLINE 98 DK 99 REFUSED D43a //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he/she// had TYPE 1 CHILD ONSET DIABETES or TYPE 2 ADULT ONSET, DIABETES? [INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS ‘BORDERLINE’ CODE AS ‘03’] //Display response option 97, only if S15 = 02, 99. // 97 (Skip to D45) [VOLUNTEERED:] YES, “GESTATIONAL” OR “ONLY WHEN PREGNANT” MENTIONED 01 YES - TYPE I (JUVENILE) 02 YES - TYPE II (ADULT ONSET) 03 [VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY 04 (Skip to D45) NO, NEVER DIAGNOSED WITH DIABETES 98 (Skip to D45) DK 99 (Skip to D45) REFUSED Example of Question levels Anchor Question

  17. Example of Question levels • D43b. //If (s15 = 02) then ask:// • //Was your/Was person in S1’s// DIABETES only during a time associated with a pregnancy? [INTERVIEWER: PROBE FOR PROPER CODE] • 01 (Skip to D45) YES ONLY WHEN PREGNANT • 02 NO • 98 (Skip to D45) DK • (Skip to D45) REFUSED • D44. //Is your/Is person on S1’s// blood sugar or glucose level, which affects diabetes, USUALLY under control or where a physician wants it, even if medication is required Always, Usually, Sometimes, Rarely, or Never? • 01 ALWAYS • 02 USUALLY • 03 SOMETIMES • 04 RARELY • 05 NEVER • 98 DK • 99 REFUSED Qualifying Question In Depth Question

  18. Question levels • Notice in the example that there are instructions to skip to another question if the answer is no. • These are anchor questions and qualifying questions which are eliminating persons from answering the in-depth questions. • As a result, when a question is not asked of a respondent it creates a missing value for the respondent which is MISSING BY DESIGN.

  19. Missing Values • Some data is missing in the survey because the respondent refused to answer the question, or did not know the answer. • These kinds of missing values need to be treated differently then those that are ‘missing by design’.

  20. Missing Values • There are some types of questions which are very important to the survey design or for public policy issues, for which it is not acceptable to have values missing. • These include questions like: • Number of children in the family (design) • Family Income (public policy)

  21. Imputation of Missing Values • Where it is important for the survey to not have any missing values, the survey statisticians have replaced the missing value, by imputing it from all of the other survey respondents that answered other questions in the survey like the respondent did. • Survey statisticians use very sophisticated models and processes to do imputation, and the practice is well accepted. • When using this survey to do analysis, it is expected that the user will consider whether or not to choose the form of the variable which includes the imputed values. • Imputed variables have a suffix of “_imp”.

  22. Weighting • Weights for each adult and child response which reflect the inverse of the probability of being selected for the survey, are constructed and should be used in all analysis. • When the weights are used, the results reflect an accurate reflection of the entire population.

  23. Weighting • If the weights for children in the OFHS were summed up across all responses, the total would be equal to the child population of Ohio. The same is true of the adult weights. • The variable name for the adult weight is “wt_a”. • The variable name for the child weight is “wt_c”.

  24. Constructed Variables • There are many variables in the OFHS file that are constructed from the responses to the survey questions that make it easier to use the OFHS. These variables include: • BMI – Body mass index. BMI is an indicator of adult and child obesity constructed from height and weight. The formula is complicated, especially for children. We make it easier for the user to do analysis of obesity by pre-calculating it.

  25. Constructed Variables • Insurance Type – In many instances, respondents to the survey had more than one source of insurance. For example, many seniors have insurance from their private pension plans and Medicare. For the purpose of creating an unduplicated count of the population by their insurance status, we have created a variable which imposes a hierarchy of insurance sources to classify the population.

  26. Using SAS with the OFHS • Step 1. Make your PC Ready. • Step 2. Download and Un-zip the SAS dataset. • Step 3. Assign a SAS Library name and restore SAS formats. • Step 4. Build and run your first OFHS SAS Program

  27. Make Your PC Ready • Create a directory for the OFHS Public Use File. It should look like this: C:\sasdata\ofhs2008 • Make sure that you have software to decompress the SAS dataset. WinZip is a popular product which works well for this. • Make sure there is enough room on the drive for the OFHS file after it is unzipped. You will need at least 800 megabytes of storage space. You will need additional temporary work space for when the file is processing. You may want to put the file on a separate drive from the drive which houses the temporary work space (typically Drive C).

  28. Download and Unzip the SAS dataset. • You will find the OFHS Public Use Dataset at: http://grc.osu.edu/ofhs/datadownloads/index.htm • Right click on the file name and select ‘save target as’. • Save the ZIP file to the directory where you will store the data (c:\sasdata\ofhs2008). • After the file has been saved, run winzip, saving the unzipped file to the same directory.

  29. Download and Unzip the SAS dataset • After you download the data, the directory will contain the following files: Formats.sas7bdat Restore_formats.sas OFHS2008.sas7bdat

  30. Assign a SAS Library name and restore SAS formats • First, you must start SAS or SAS Enterprise Guide. • Open the Restore_Formats.sas in the program editor window.

  31. Assign a SAS Library name and restore SAS formats The Restore_formats file will look like this: /* This program creates a formats catalog from an existing formats dataset, formats.sas7bdat, which should be in the current directory. The resulting formats catalog file will be created in the current directory. */ LIBNAME ofhs 'D:\final_data_delivery_021109'; libname library 'D:\SASFORMATS'; procformat library=library cntlin=library.formats; run; You need to change the LIBNAME ofhs statement to reflect the drive and directory location of the files that you unzipped. You can now ‘submit’ or ‘run’ the restore formats program.

  32. Build and run your first OFHS SAS Program • You should only use procedures in SAS that support the use of complex survey designs. Including: • Proc Surveymeans • Proc Surveyfreq • Proc Surveylogistic • Proc Surveyreg • Most newcomers will use Proc Surveymeans to start out. If you are familiar with Surveylogistic or Surveyreg, you probably do not need this tutorial.

  33. Proc Surveymeans Here is a simple program which calculates the percent of children by Insurance Type. It includes a 95% confidence interval around the mean. Note the names of the variables which reflect the command syntax for complex sampling design (Stratum, and Weight). The Stratum Variables will always stay the same. There are different weights for children (WT_C) and Adults (WT_A). ProcSurveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM; Stratum STRATUM; Weight WT_c; Var i_type_c; Class i_type_c; run;

  34. Proc Surveymeans results (with a little cutting and pasting and formatting of values)

  35. Proc Surveymeans Now you might add some domain analysis to this, breaking out insurance status for children by poverty level. DATA ofhs.children; SET OFHS.OFHS2008; if h87_imp in ('01','02','03','04') then FPL200='0 to 200% FPL'; else fpl200='201+ % FPL'; ProcSurveymeans data=ofhs.ofhs2008 ALPHA=.05 nobs mean CLM SUMWGT CLSUM; Stratum STRATUM; Weight WT_c; Var i_type_c; Class i_type_c; Domain fpl200; run;

  36. Surveymeans with a Domain Statement

  37. Proc Surveyfreq procsurveyfreq data=ofhs.ofhs2008; stratum stratum; Tables h87_imp*insrd_a / alpha=.05 cl clwt deff; weight WT_A; run;

  38. Results of Proc Surveyfreq

  39. Domain Analysis in Proc Surveyfreq There is no Domain Statement for Proc Surveyfreq. Add a variable(s) to the front of the tables statement. procsurveyfreq data=ofhs.ofhs2008; stratum stratum; Tables h87_imp*s15_imp*insrd_a / alpha=.05 cl clwt deff; weight WT_A; run;

  40. Domain Analysis in Proc Surveyfreq

  41. The END

More Related