1 / 46

Biostat 201: Winter 2011

Biostat 201: Winter 2011. Lab Session 1 Week 1 and Week 2. Introduction. Wendy Shih wendyshi@ucla.edu Office Hours: Tues 2-3pm or by appointment A1-228 or Biostat Consulting Room (two doors to the left of the Lab). Access to SAS/STATA. In the lab: login= sph , password=hello

MikeCarlo
Download Presentation

Biostat 201: Winter 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostat 201: Winter 2011 Lab Session 1 Week 1 and Week 2

  2. Introduction • Wendy Shihwendyshi@ucla.edu • Office Hours: • Tues 2-3pm or by appointment • A1-228 or Biostat Consulting Room (two doors to the left of the Lab)

  3. Access to SAS/STATA • In the lab: login=sph, password=hello • one year SAS student license • Check with your department • www.softwarecentral.ucla.edu • Computers/laptops at UCLA library • TLC lab at Biomed library • STATA Only • shortcut.clicc.ucla.edu

  4. Typical lab session • 4 assignments total • Brief (very brief!) overview of the assignment • Introduce statistical tools/methods that may be helpful with accompanying SAS/STATA code fragments • Further discussion (time permitting) • Go analyze!

  5. Some additional notes • Both SAS and STATA code will be introduced, but need only to know how to use one (so use whichever is most familiar to you) • Code will not be given to you in electronic format • Might want to bring a USB drive or have a way to save your documents • No raw outputs from SAS or STATA. All submitted results must be formatted.

  6. Please Do NOT Paste Raw Outputs . tabstat dage, by(grad) stat(n mean semean min max) Summary for variables: dage by categories of: grad (Center Grade) grad | N mean se(mean) min max ---------+-------------------------------------------------- excellen | 36 29.13889 1.993702 18 68 good | 36 30.27778 1.581446 18 60 fair | 36 37.13889 1.792911 18 55 poor | 36 37.97222 1.853134 19 69 ---------+-------------------------------------------------- Total | 144 33.63194 .9552307 18 69 ------------------------------------------------------------ The MEANS Procedure Analysis Variable : dage N N grad Obs Miss Mean Std Error Minimum Maximum ------------------------------------------------------------------------------------------- 1 36 0 29.1388889 1.9937015 18.0000000 68.0000000 2 36 0 30.2777778 1.5814455 18.0000000 60.0000000 3 36 0 37.1388889 1.7929105 18.0000000 55.0000000 4 36 0 37.9722222 1.8531338 19.0000000 69.0000000 -------------------------------------------------------------------------------------------

  7. Formatted Results Table 1: Summary Statistics for Donor Age (Years) by Center Grades

  8. The assignments • All four assignments are reports, not problem sets • Introduction • Methods • Results • Can be submitted via e-mail as a Microsoft Word file • E-mail: wendyshi@ucla.edu • Subject: Biostat 201 W10 hw# Last First • Filename: Biostat 201 W10 hw# Last First • ex: Biostat 201 hw1 Shih Wendy

  9. Assignment grades • Graded on a 0.0 – 4.0 scale • 0.0 to 1.9: major errors / misunderstandings • 2.0 to 2.5: a few major or multiple minor errors • 2.6 to 3.0: a few minor errors • 3.1 to 3.5: good/excellent job • 3.6 to 4.0: very impressive!

  10. Assignment expectations • Brief • 2.5-3.5 pages (with tables and figures), 12pt, double-spaced is often sufficient • Complete • Requested analyses were performed and properly interpreted • Logical • Has an easy-to-follow flow • Easy to see how the analyses guided each step of the investigation • No ambiguity on what you were thinking

  11. Common pitfalls • Lack of explanation • Why are you doing what you are doing? • Example: • We run a multivariate linear regression. (why?) • We run a multivariate linear regression to evaluate the association between crime rate and depression while adjusting for socioeconomic factors. (ah, that’s better!)

  12. Common pitfalls • Lack of interpretation • On what basis are you making your claims? • Example: • There is a significant difference between the IQ’s of UCLA and USC students. (what makes you say this?) • The two-sample t-test result indicates that the SAT scores of UCLA and USC freshmen are statistically different (p=0.0032), with UCLA students having an average SAT score that is 220 points greater than USC students. (note: method used, measure used, statistical significance, magnitude, direction)

  13. Common pitfalls • Lack of follow-up • How exactly did your findings guide you in your investigation? • Example: • A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. (How does this finding influence your analysis?) • A scatterplot of SAT score vs. GPA suggests a positive linear relationship among males, but a negative linear relationship among females. Therefore, the association of SAT score and GPA among males and females were evaluated separately.

  14. Questions to ask yourself • What are you investigating? • What analytical method are you using to investigate it? • What do the results of that analysis tell you? • How do those results guide your subsequent analyses, or what conclusions do you draw from it?

  15. SAS/STATA code key • I will use the following convention in these slides: • statements: bold • keywords: italics • options: underlined • Variables, or something you specify yourself: courier font

  16. Assignment 1

  17. What do we need to do? • Import data • Summary statistics and plots • Choose and specify a model • Investigate if the model is appropriate • Predicted mean differences for covariate profiles • Conduct and interpret the model results

  18. SAS: Importing data • http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm • http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm • Can use import wizard:file  import data… • proc importout=datasetdatafile="directory_of_excel_file"dbms=excelreplace;sheet="sheet_name";run;

  19. SAS: Importing data • http://www.ats.ucla.edu/stat/sas/faq/rwxls8.htm • http://www.ats.ucla.edu/stat/sas/faq/read_delim.htm • Can use import wizard:file  import data… • proc importout=hdldatadatafile="C:\SAS\data\hdltable.csv"dbms=csvreplace;sheet="sheet3";run;

  20. STATA: Importing data • http://www.ats.ucla.edu/stat/stata/faq/readcommatab.htm • cd "directory_of_csv_file" • insheetusingfile_name

  21. Example: Kidney Data SAS proc import datafile="G:\TA - Biostat 201 Winter 2011\KIDNEY.csv“ out=kidney dbms=csv replace; run; STATA cd "G:\TA - Biostat 201 Winter 2011" insheet using "KIDNEY.csv"

  22. SAS: Summary statistics • proc meansdata=dataset [options];varvar1 var2 var3;run; • proc meansdata=dataset [options];classgrpvar;varvar1 var2 var3;run; • proc univariatedata=dataset;varvar1 var2 var3;run;

  23. SAS: Summary statistics procmeans data=kidney nmiss mean stderr min max; var dage cith; run; procmeans data=kidney nmiss mean stderr min max; class grad; var dage cith; run; procunivariate data=kidney; var dage cith; run; procunivariate data=kidney; class grad; var dage cith; run;

  24. STATA: Summary statistics • summarizevar1 var2 • bysort grpvar: summarizevar1 var2 • summarizevar1 var2,detail • sum dage cith • sum dage cith, detail • bysort grad: sum dage cith, detail

  25. SAS: Bivariate statistics (continuous variables) • proc ttestdata=dataset;classgrpvar;varvar1 var2 var3;run; • proc npar1waydata=dataset;classgrpvar;varvar1 var2 var3;run;

  26. SAS: Bivariate statistics (continuous variables) • procttest data=kidney; class cens; var cith; run; • procnpar1way data=kidney; class cens; var cith; run;

  27. STATA: Bivariate statistics (continuous variables) • ttestvar1, by(grpvar) • kwallisvar1, by(grpvar) • ttest cith, by(cens) • ttest cith, by(cens) unequal • kwallis cith, by(cens)

  28. SAS: Plots • proc gplotdata=dataset;plotyvar * xvar = grpvar;run; quit; • procgplot data=kidney; plot dage*cith=cens; run; quit;

  29. STATA: Plots • twoway (scatter yvarxvarifgrpvar==value, mcolor(color)) • twoway (scatter dage cith if cens==0, ms(o) mcolor(red)) (lfit dage cith if cens==0, clcolor(red)) (scatter dage cith if cens==1, ms(o) mcolor(blue)) (lfit dage cith if cens==1, clcolor(blue)), legend(off)

  30. Choose a model • Right now, we assume that this assignment is driving toward a linear regression model. Just know that this may not always be appropriate in real-world situations.

  31. SAS: Linear model • procregdata=dataset;modelyvar = x1x2x3;run; quit; • procreg data=kidney; model cith=censdage; run; quit;

  32. STATA: Linear model • regress yvarx1x2x3 • regress cith cens dage

  33. SAS: Stratified model • proc sortdata=dataset; by grpvar;run;procregdata=dataset;modelyvar = x1x2x3;bygrpvar;run; quit; You must SORT by the grouping variable before you run the stratified model.

  34. SAS: Stratified model • procsort data=kidney; by cens; run; • procreg data=kidney; model cith=dage; by cens; run; quit;

  35. STATA: Stratified model • bysortgrpvar: regress yvarx1x2 • bysort cens: regress cith dage

  36. SAS: Dummy encoded model • proc regdata=dataset;modelyvar = x1x2x3z1z2;run; quit; • Note: “z” represents dummy-encoded variables • procreg data=kidney; model cith = dage cens excel good fair; run; quit; Newly created dummy variables.

  37. STATA: Dummy encoded model • regress yvarx1x2z1 z2 • Note: “z” represents dummy-encoded variables • regress cith cens dage excel good fair Newly created dummy variables.

  38. SAS: Interaction model • datadataset;setdataset;intnvar = x1 * x2;run;proc regdata=dataset;modelyvar = x1x2intnvar;run; quit;

  39. SAS: Interaction model • data kidney; set kidney; d_c=dage*cens; run; • procreg data=kidney; model cith=dagecensd_c; run;quit;

  40. STATA: Interaction model • gen intnvar = x1 * x2regressyvarx1x2intnvar • gen d_c=dage*cens regress cith dage cens d_c

  41. Predicted mean differences • Question:Observation 1 has “this” particular profile, and observation 2 has “that” particular profile. Is there a difference in their predicted mean response/outcome? • Example:Obs1: 56 years old and censoredObs2: 61 years old and censored

  42. Predicted mean differences • Strategy • Add observations with the specified covariate profiles with the outcome missing • Run the linear regression model and request the predicted outcome with standard error of the prediction • Look at the results

  43. SAS: Predicted mean differences • Add observations • data profiles; input dage cens; cards; 56 0 61 0 ; run; data kidney; set kidney profiles; run;

  44. SAS: Predicted mean differences • Analyze and request standard error of the prediction • procreg data=kidney; model cith=dagecens; output out=kidney_new p=ypredstdp=yprese; run; quit; • Now if you open the “kidney_new” dataset, you can scroll down and view the predicted values and the standard error of the prediction

  45. STATA: Predicted mean differences • Add observations • It’s probably easiest to do this using the data editor • Suppose our dataset has 100 observations: • set obs 146 replace dage=56 in 145 replace cens=0 in 145 replace dage=61 in 146 replace cens=0 in 146

  46. STATA: Predicted mean differences • Analyze and request the standard error of the prediction • regress cith cens dage • predict ypred • predict yprese, stdp • Now if you open the data browser, you can scroll down and view the predicted values and the standard error of the prediction

More Related