1 / 31

Stata Workshop #1

Stata Workshop #1. Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu. Outline. Do files Data entry Data management Data description Estimation: Confidence Interval Hypothesis testing. Do files. Stata programs Easy to add or skip comments

chul
Download Presentation

Stata Workshop #1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StataWorkshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu

  2. Outline • Do files • Data entry • Data management • Data description • Estimation: Confidence Interval • Hypothesis testing

  3. Do files • Stata programs • Easy to add or skip comments • One click/command can run the whole program • Reproducible • Don’t need to retype all of the commands • Interactive work vs. do files

  4. Data Entry

  5. Stata Commands • cd: Change directory • dir or ls: Show files in current directory • insheet: Read ASCII (text) data created by a spreadsheet • infile: Read unformatted ASCII (text) data • infix: Read ASCII (text) data in fixed format • input: Enter data from keyboard • save: Store the dataset currently in memory on disk in Stata data format • use: Load a Stata-format dataset • count: Show the number of observations • list: List values of variables • clear: Clear the entire dataset and everything else • memory: Display a report on memory usage • set memory:Set the size of memory

  6. Ways to enter data • Change the directory to the folder you like • cd c:\Stata • Common separated values (.csv) format files • insheet using test.csv,clear (with variable names) • infile gender id race ses schtyp str10 prgtype read write math science socst using hs0.raw, clear (without variable names) • Stata (.dta) files • use test • Type in data one by one • input id female race ses str3 schtype prog read write math science socst • End (when you are done) • What’s in the dataset? • describe • list

  7. Data Management

  8. Stata Commands • pwd: show: current directory (pwd=print working directory) • keep if: keep observations if condition is met • Keep: keep variables or observations • drop: drop variables or observations • append: append a data file to current file • sort: sort observations • merge: merge a data file with current file • codebook: show codebook information for file • label data: apply a label to a data set • order: order the variables in a data set • label variable: apply a label to a variable • label define: define a set of a labels for the levels of a categorical variable • label values: apply value labels to a variable  • encode: create numeric version of a string variable • rename a variable • recode: recode the values of a variable • notes: apply notes to the data file • generate: creates a new variable • replace: replaces one value with another value • egen: extended generate - has special functions that can be used when creating a new variable 

  9. Merging two datasets • test1 and test2 have the same variables but different subjects use test1 append using test2 save test12 • test3 and test4 have the same subjects and only share a link variable, e.g. ID use test3, clear sort id save test3,replace use test4, clear sort id save test4,replace use test3 merge id using test4 save test34

  10. Play with Variables • use test • label variable gender "Male" • rename gender male • gen female=1-male • order id male female • encode prgtype, gen(prog) • codebook prog • keep if female==1 (delete male) • drop female

  11. Dummy Variables • A categorical variable with K possible levels • Need K-1 dummy variables (one as the reference) • Dummy variables are convenient for regression analysis • How to create dummy variables? • Use generate command • gen female=1-gender • Use tabulate command • tabulate gender, gen(male) • Use factor variables • xi i.gender • list,clean

  12. Data Description

  13. Stata Commands • describe: describe a dataset • log: create a log file • summarize: descriptive statistics • tabstat: table of descriptive statistics • table: create a table of statistics • stem: stem-and-leaf plot • graph: high resolution graphs • kdensity: kernel density plot • histogram: histogram for continuous and categorical variables • tabulate: one- and two-way frequency tables • correlate: correlations • pwcorr: pairwise correlations

  14. Example: raw data • log using test.txt, text replace • use lead • describe • sum maxfwt, detail • histogram maxfwt, by(Group) normal • graph box maxfwt, by(Group) • stem maxfwt • kdensity maxfwt • tab Group sex • cor ageyrs maxfwt,sig • cor ageyrs maxfwt if sex==1 (male only),sig • pwcorr ageyrs maxfwt fwt_r,sig • log close

  15. Example: grouped data • use group (a grouped dataset) • sum age [fweight=freq],detail • hist age [fweight=freq] • Pretty much the same as raw data. Just need to specify the weight.

  16. Some Review • Use both location and spread measures to summarize a dataset • Mean, standard deviation and range are easily affected by extreme observations • Median and inter-quartile range are less affected by extreme observations • Coefficient of variation (standard deviation divided by mean) removes the scale effect.

  17. Estimation

  18. Estimation of Parameters • Binomial distribution • Parameters n (usually known) and p • How to estimate p? • Poisson distribution • Parameter λ • How to estimate λ? • Normal distribution • Parameters µ and σ2 • How to estimate µ and σ2? • σ2 unknown  t distribution

  19. Stata Commands • Raw data • ci [varlist] [if] [in] [weight] [, options] • confidence intervals for mean, proportion (b) and count (p) • Summarry statistics • cii #obs #mean #sd [, ciin_option] • Normal • cii #obs #succ [, ciib_options] • Binomial

  20. Examples • gen female=sex-1 • tab female Group • What’s the average maxfwt for females in the exposed group? • ci maxfwt if female==1 & Group==2 (raw data) • sum maxfwt if female==1 & Group==2 • cii 16 59 20.887,level(95) (summary statistics) • What’s the proportion of females in the exposed group? • gen expose=Group-1 • ci expose if female==1,b • cii 48 16,level(95)

  21. Hypothesis Testing

  22. Stata Commands (mean) • ttest • Raw data • ttest varname == # [if] [in] [, level(#)] • ttest varname1 == varname2 [if] [in], unpaired [unequal welch level(#)] • ttest varname1 == varname2 [if] [in] [, level(#)] • ttest varname [if] [in] , by(groupvar) [options1] • Summarry statistics • ttesti #obs #mean #sd #val [, level(#)] • ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, options2]

  23. Examples • One sample • Is the average maxfwt for females in the exposed group significantly lower than 45? • ttest maxfwt==45 if female==1 & Group==2 • ttesti 16 59 20.887 45 (summary statistics) • Two samples • Do females have a higher average maxfwt than males in the exposed group? • ttest maxfwt if Group==2, by(female) • sum maxfwt if female==0 & Group==2 • ttesti 16 59 20.887 30 60.167 27.28

  24. Stata Commands (variance) • sdtest • Raw data • sdtest varname == # [if] [in] [, level(#)] • sdtest varname1 == varname2 [if] [in] [, level(#)] • sdtest varname [if] [in] , by(groupvar) [level(#)] • Summarry statistics • sdtesti #obs {#mean | . } #sd #val [, level(#)] • sdtesti #obs1 {#mean1 | . } #sd1 #obs2 {#mean2 | . } #sd2 [, level(#)]

  25. Examples • One sample • Is the variance of maxfwt for females in the exposed group significantly greater than 100? • sdtest maxfwt==10 if female==1 & Group==2 • sdtesti 16 59 20.887 10 (summary statistics) • Two samples • Do females have a greater variation in maxfwt than males in the exposed group? • sdtest maxfwt if Group==2, by(female) • sum maxfwt if female==0 & Group==2 • sdtesti 16 59 20.887 30 60.167 27.28

  26. Stata Commands (proportion) • prtest • Raw data • prtest varname == #p [if] [in] [, level(#)] • prtest varname1 == varname2 [if] [in] [, level(#)] • prtest varname [if] [in] , by(groupvar) [level(#)] • Summarry statistics • prtesti #obs1 #p1 #p2 [, level(#) count] • prtesti #obs1 #p1 #obs2 #p2 [, level(#) count]

  27. Examples • One sample • Is it more than 50% of females in the exposed group? • prtest expose==0.5 if female==1 • prtesti 48 0.3333333 0.5 • Two samples • Are there more females in the exposed group than the control group? • prtest female, by(expose) • tab expose female, r • prtesti 78 0.4103 46 0.3478

  28. Power and Sample Size

  29. Stata Command (sample size) • One sample • continuous • sampsi μ0μ1, sd(.) p(.) a(.) onesam • sampsi 3500 3800, sd(420) p(.9) onesam • Binary proportions • sampsi p0 p1, p(.) onesam • sampsi 0.4 0.25, p(0.9) onesam • Two samples • continuous • sampsi μ1μ2, p(.) sd1(.) sd2(.) a(.) • sampsi 132.86 127.44, p(0.8) sd1(15.34) sd2(18.23) • Binary proportions • sampsi p1 p2, p(.) • sampsi 0.4 0.25, p(0.9)

  30. Stata Command (power) • One sample • continuous • sampsi μ0μ1, sd(.) n(.) a(.) onesam • sampsi 84.4 90.1, sd(10.3) n(5) onesam onesided • Binomial proportion • sampsi p0 p1, n1(.) onesam • sampsi 0.25 0.4, n1(100) onesam • Two samples • continuous • sampsi μ1μ2, n1(.) n2(.) sd1(.) sd2(.) a(.) • sampsi 9 14, n1(100) n2(100) sd1(15.34) sd2(18.23) • Binomial proportions • sampsi p1 p2, n1(.) n2(.) • sampsi 0.4 0.25, n1(100) n2(150)

  31. Useful links • http://www.ats.ucla.edu/stat/stata/ • Once the D2L site is created, all of the handouts and related materials will be posted on the D2L site.

More Related