1 / 21

Exporting Data for Analysis

Exporting Data for Analysis. Tom Newman Josh Senyak 15 August 2013. Loose Ends, Access questions (Josh). Assignment 3. Lab 3: Exporting and Analyzing Data 8/15/2013.

dalit
Download Presentation

Exporting Data for Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exporting Data for Analysis Tom Newman Josh Senyak 15 August 2013

  2. Loose Ends, Access questions (Josh)

  3. Assignment 3 Lab 3: Exporting and Analyzing Data 8/15/2013 Determine if neonatal jaundice was associated with the 5-year IQ scores and create a table or paragraph appropriate for the “Results” section of a manuscript summarizing the association. Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744)

  4. Newman T et al. N Engl J Med 2006;354:1889-1900

  5. Essential Elements • Sample size (N1 jaundiced, N0 non-jaundiced) • Indication of effect size (report both means, and the difference between them) • Get direction of effect right. • Indication of variability: sample SDs, SEs of means, CIs of means, and (most important) CI of difference between means.

  6. A few JIFee disclaimers • It would be better to refer to the “Jaundice” variable as “tb_ge25” (what we actually used), because jaundice is normal in newborns, so many controls were probably jaundiced • We only did one exam of each type per child • We did have repeated measures of the PEDS (Parent evaluation of developmental status) via phone calls

  7. What is wrong with this picture?

  8. Browner on Figures “Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women…[or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies] is not worth the ink required to print it. Use text instead.” Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90

  9. Figure: Percent of children that maintained/reduced vs. increased BMI for age over 6 months. aP < .05 using chi-squared test. (AOM= America on the Move; SM = Self Monitor) Pediatrics 2007; 120:e869-75

  10. Figure 1: Box plots of IQ scores at age 5 in N1 infants in the hyperbilirubinemia group and N0 controls.

  11. Box Plot • Median Line • Box extends from 25th to 75th percentile • Whiskers to upper and lower adjacent values • Adjacent value = 75th /25th percentile ±1.5 x IQR (interquartile range); roughly median ± 2.7 SD, includes ~99.3% if normally distributed • Values outside the adjacent values are graphed individually • Would be nice if area of box were proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.

  12. Extra Credit Extra Credit • Report within-subject SD as a measure of reliability. • Calculate repeatability • Bland-Altman plot with mean difference and 95% limits of agreement

  13. N = NS&R (children examined by both Satcher and Richmond) = 142 Mean Difference = 0.49 (95% CI -0.41 – 1.38)

  14. With “notrend” option, Limits of agreement = (-11.0,10.1)

  15. Bland-Altman in Stata findit Bland-Altman (or ssc install batplot) batplot  richmondscore satcherscore, notrend title(Agreement between Richmond and Satcher) ytitle(Difference (Richmond - Satcher)) xtitle(Average of Richmond and Satcher)

  16. Access vs Stata • Use Access for data collection, ongoing reports, data management and monitoring • Short, discrete queries or reports done many times • Use Access to link tables to create flat files for Stata • Use Stata to create or recode variables and do most analyses, especially those requiring multiple steps • Goals: complete transparency, effortless reproducibility

  17. Newman tips 1: Work vs Play • Work: always done with a do-file • Play: run any commands you want except “save.” • If you make any changes to the dataset while playing (that your want to keep), they MUST be in a do file • Fewer long do files better than more short ones

  18. Newman tips 2: variable names • Make variable names informative • E.g., tb_ge25 is more informative than “HighBili”; “male” is more informative than “sex” • Generally put round numbers at the bottom of intervals • I use ge for ≥ and lt for < • I avoid gt and le • Write down variable naming conventions early in the study

  19. Newman tips 3: some useful Stata commands • Quickly divide a continuous variable into categories with labels: egen iq_cat=cut(iq), at(40(10)160) lab • User-written (Estie Hudes) ado files to document how variables were created: ggen, rreplace, rrename, rrecode • Create labeled indicator variables: tab race, gen(race) • Find your variables: lookfor race • Encode string variables: encode stringvar, gen(intvar)

  20. Lab 3

More Related