270 likes | 413 Views
Computing for Research I Spring 2012. Stata Graphics February 16. Primary Instructor: Elizabeth Garrett-Mayer. Basic syntax for commands. p refix: command varlist , options Examples: regress y x , level(90) by race: sum y x , detail t test y , by(x) unequal.
E N D
Computing for Research ISpring 2012 Stata Graphics February 16 Primary Instructor: Elizabeth Garrett-Mayer
Basic syntax for commands • prefix: command varlist, options • Examples: • regress y x, level(90) • by race: sum y x, detail • ttesty, by(x) unequal
Stata Graphics • Maybe we can just end class now! • Check out these links: • http://www.ats.ucla.edu/stat/stata/library/GraphExamples/default.htm • http://www.ats.ucla.edu/stat/stata/topics/graphics.htm • http://data.princeton.edu/stata/graphics.html • http://www.stata.com/capabilities/graphics.html
Basic univariate displays • Boxplots • Stem and leaf • Histograms • Density plots
Ceramide Data • Let’s look at the ceramide markers • What are their distributions? • Are there outliers? • Should we consider taking logs, or using % change? Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head and neck cancers: serum C₁₈-ceramide as a novel biomarker for monitoring response. Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB, Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K, Ogretmen B. ClinCancer Res. 2011 Sep 15;17(18):6097-105. Epub 2011 Jul 26.
Histogram • hist c18
Let’s make it prettier * prettier histograms hist c18 , freqxaxis(1 2) ylabel(0(2)24) xlabel(20 "Twenty" 40 "Forty") hist c18, title("Histogram of C18 Ceramide") subtitle("PI: K. Shirai") hist c18, ytitle("number of patients") freqyline(0(10)20) hist c18, xaxis(1 2) xlabel(19.6 "mean" 11.9 "median", axis(2) grid) finding help on these can sometimes be tricky! e.g. help axis_choice_options
Boxplots • graph box c18
Boxplots graph box c18, by(cycle) graph box c18, over(cycle) tab cycle graph box c18 if cycle<7, over(cycle) sort patient cycle merge m:1 patient using "Ptdata.GemDox.dta" graph box c18 if cycle<7, over(cycle) over(gender) graph hbox c18, over(initial) capsize(5)
graph hbox c18, over(initial) capsize(5) graph hbox c18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large)) graph hbox c18, over(initial) ytitle(“C18”)
Labels • Sometimes xlabels cannot be applied (e.g. boxplots) • need to label your values • Example: cycle for boxplots • label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle 5" 7 "cycle 7" • label values cycle cycle • graph box c18 if cycle<7, over(cycle) • (Hint: use this on the homework!)
Stem and Leaf . stem c18 Stem-and-leaf plot for c18ceramide (C18 ceramide) c18ceramide rounded to nearest multiple of .1 plot in units of .1 0** | 42,43,44,46 0** | 57,57,67,81,89,90,96,98,99,99 1** | 01,06,08,08,14,15,19,20,35,44 1** | 62 2** | 03,15,16,18,19,19,22 2** | 82 3** | 17 3** | 4** | 23,49 4** | 58,68,68 5** | 5** | 6** | 37 6** | 86
Dotplot • Excellent way to show data across groups when you have a relatively small dataset • dotplot y, over(group) dotplot c18, over(cycle) dotplot c18, over(gender) dotplot c18, over(gender) nogroup dotplot c18, over(gender) nogroup jitter(3) dotplot c18, over(gender) nogroup median center
Scatterplots • Two way graph • Syntax: • graph twoway scatter y x1 x2 • graph twoway scatter y x1 • Example: • graph twoway scatter c18 totalceramide
Regression example • Scatterplot • Residual plots • Leverage • Fitted line with raw data
Code * generate new variable gen logc18=log(c18) scatter logc18 totalcer scatter logc18 totalcer, mlabel(gender) scatter logc18 totalcer, mlabel(gender) s(i) scatter logc18 totalcer, s(Oh) * redo regression regress logc18 totalcer rvfplot, yline(0) lvr2plot predict logfit * make plot of fitted model and raw data scatter logfit logc18 totalcer scatter logfit logc18 totalcer, s(i o) c(l .) graph twoway scatter logfittotalcer, s(i) c(l) || scatter logc18 totalcer, s(o) c(.) graph twoway scatter c18 totalcer regress c18 totalcer * residual plot * (residual vs. fitted) rvfplot * the long way * 1. generate a new variable from the regression, residuals predict resid, res * 2. generate a new variable from the regression, fitted values predict fit scatter res fit, yline(0) * leverage vs. residual plot lvr2plot * take transform of C18? gladder c18 boxcox c18
Fancier way to put regression lines infile str14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw graph twoway scatter change setting graph twoway (scatter change setting ) (lfit change setting ) graph twoway (scatter change setting ) (qfit change setting ) graph twoway (scatter change setting ) (lfitci change setting ) • scatter makes a scatterplot of the two variables • lfit plots the regression line of y on x • qfit plots a fitted quadratic model of y on x • lfitci plots the line AND a confidence interval!
Fancier way to put regression lines Plot using qfit Plot using lfitci
graph twoway (lfitci change setting) (scatter change setting, mlabel(country) ) • One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua). • We can solve this problem by specifying the position of the label relative to the marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option. • We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock).
gen pos=3 replace pos = 11 if country == "TrinidadTobago" replace pos = 9 if country == "CostaRica" replace pos = 2 if country == "Panama" | country == "Nicaragua“ graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) )
Legends graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI")) graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(off)
Spaghetti plots Command available from UCLA: spagplot * spaghetti plots clear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv" finditspagplot spagplot c18 cycle, id(patient) spagplot c18 cycle, id(patient) nofit * remove patients who only have cycle=1 sort patient cycle by patient: gen visit=_n egenmaxvis=max(visit), by(patient) spagplot c18 cycle if maxvis>1, id(patient) nofit * or, use c(L) graph twowayscatter c18 cycle if maxvis>1, c(L) help connectstyle
other neat stuff • graph matrix • saving graphs: click and save as desired format • saving and combining (see princetonsite, section 3.3) • http://data.princeton.edu/stata/graphics.html • See GraphExamples on ucla site: • http://www.ats.ucla.edu/stat/stata/library/GraphExamples/