290 likes | 312 Views
Teaching with Stata. Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu. First Course Requirement—Data Entry. I want a first course to be able to do the things I want students to do:
E N D
Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu
First Course Requirement—Data Entry • I want a first course to be able to do the things I want students to do: • Enter and edit data--must be “want to know topic” • Students can do a small survey to get data on topics of interest to them. • Voter poll • Attitudes toward diversity issues on campus • Beliefs about regulating the internet • Learn how to create a codebook, use codebookandcodebook, compact • Where possible use “real” data WCSUG Presentation
First Course Requirement—Data Management • Balance statistical content with proper data management content—hard decision • Storing original dataset and creating a working dataset • Keeping a record of every data modification they make using do-file • Menu system is an aid • Do-files are the requirement • Missing values--distinguish types • Variable names, labels, and value labels WCSUG Presentation
First Course Requirements—Data Management • Transformations – log, , exp • Logical editing – beware of logical transformations when missing values are present (gen y = x < 10 leads to “.” transforming to 0) • Appending • Append student generated datasets • Merging • Merging two waves of data WCSUG Presentation
First Course Requirements—Data Management • Constructing Measures • When to use egen newvar =rowtotal(var1, var2, var3) • When to use egen newvar =rowmean(var1, var2, var3) • When to use misschk command, what it does • Suppose the variable category is 0 or 1 • If there are missing values in category, there is a difference between • gen y = 1 if category • gen y = 1 if (category==1) • gen y = 1 if (category>0) • The first and third will give scores of 1 for missing values. The second will give a score of 0 for missing values - BEWARE WCSUG Presentation
First Course Requirements—Data Management • edit command, insheet input, infile(csv files) • gen newvar = ln(oldvar) • Rarely use replace oldvar = sqrt(oldvar) – only when correcting an error – don’t replace data • merge ptid assessment using file, update (need for data to be sorted) WCSUG Presentation
First Course Requirement (2) • Data presentation, numerical summary measures – summarize, detail; list; browse; edit; describe; codebook; codebook, compact • Graphic presentation--bar chart, histogram, box plot seem minimum • Probability computations – binomial, binomialtail, chi2, chi2tail, F, Ftail, normal – use of the inverse functions for these. WCSUG Presentation
Examples • summarize sp,detail; list sp; describe s*; codebook s* • display binomial(10,3,0.1) for cumulative or display Binomial(10,3,.1) for reverse cumulative; Note disp 1-binomial(10,2,.1) gives the same result (also binomialtail(10,3,.1) • display normal(1.2) • gen y = invnormal(uniform())*5+20 WCSUG Presentation
First Course Requirement (3) • Confidence intervals • Binomial – ci—ci variable • Normal – ci—ci variable • Poisson – ci—ci variable, poisson • Percentiles – • summarize,d • centile price, c(10(10)90) WCSUG Presentation
Examples • cii 20 4; • cii 20 4, agresti • Sometimes we want to use the Agresti formulation. The exact is usually preferable • ci varname, level(99) • summarize weakness, detail • Can use su weakn,d (i.e. abbreviate commands, options and variables) • centile weakness,c(20,40,60,80) • Or centile weakness,c(20(20)80) WCSUG Presentation
First Course Requirements (4) • Hypothesis Testing: • Normal r.v.s • One sample (including paired data) - • Two sample - ttest • K samples – ANOVA • Binomial variables • One sample – proportion • Two samples – tabulate, chi2 WCSUG Presentation
Examples • ttest sp = 120 [one-sample] • ttest spmen = spfem [paired] • ttest spmen = spfem, unpaired unequal welch • ttest sp, by(sex) [unequal welch etc.] • Also immediate form – see help • anova sp agegrp WCSUG Presentation
Examples • bitest success = 0.8[one sample binomial] • tabulate success group, chi2 row col • prtest success, by(group)[two sample binomial] WCSUG Presentation
First Course Requirements (5) • Hypothesis Testing (cont.) • Power considerations – sampsi (or spreadsheet – nice exercise for some good ones) • Nonparametric methods – sign, signrank, ranksum • Contingency tables – tabulate, epitab WCSUG Presentation
Examples • sampsi 132.86 127.44, p(0.8) r(2) sd1(15.34) sd2(18.23) • ranksum sp, by(survive) • signrank before = after • When should we supplement Stata with other software such as G*power 3 that is free and more flexible than sampsi or other software such as PASS or nQuery Advisor? WCSUG Presentation
First Course Requirements (6) • Simple linear regression – regress, rvfplot, other diagnostics • Correlation – corr, spearman, ktau – I tend not to use corr because of the sensitivity to the normality assumption for tests and confidence intervals • Only pwcorr and not corr provide test of significance WCSUG Presentation
Examples • regress mpg weight • rvfplot • Stata’s “type a little, get a little” very different from other packages • correlate mpg weight or pwcorr mpg weight (especially when you have more than 2 variables – can specify sig and obs—Note that these only work with pwcorr) • spearman mpg weight – would be nice to have Stata produce a Spearman correlation matrix WCSUG Presentation
Examples • It’s easy to use permutation tests . permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest on estimation sample) Monte Carlo permutation results Number of obs = 97 command: ttest ald7, by(anyhcq) t: r(t) permute var: anyhcq --------------------------------------------------------------------------- T | T(obs) c n p=c/n SE(p) [95% Conf. Interval] -------------+------------------------------------------------------------- t | 1.648305 13 100 0.1300 0.0336 .071073 .2120407 --------------------------------------------------------------------------- Note: confidence interval is with respect to p=c/n. Note: c = #{|T| >= |T(obs)|} • One can do similar things with the bootstrap • These are easy to use and intuitive for students WCSUG Presentation
Use of Stata in the Classroom • Use Stata sparingly • It’s not easy to follow commands typed or used from menus – students will get confused • Have handouts of what you do – make spacing large enough that students can annotate – even if only to write nasty things about the instructor • Balancing coverage of Stata, e.g. data management with coverage of Statistics is a constant issue • Remember – it’s a course in statistics, not in Stata WCSUG Presentation
Data Sets • Place data sets on a LAN or common drive or available for copying to flash drive or CD • Use real data • Not too many variables • May have missing values – but should not affect main analyses – unless you want to demonstrate the problems with missing values WCSUG Presentation
In the Classroom • Using CD rather than flash drive is better(?) • Many desktops have USB port located inconveniently (darn you Dell!) • Sometimes newer PCs have USB port on monitor, and laptops usually have an easy slot for the flash drive • Light level in the room should allow students to read easily • Days of dim projectors are over WCSUG Presentation
In the Classroom (2) • Enlarge the Stata font by using right mouse button • I have found that 14 point is pretty good • Be careful about wraparound of output – if needed, reduce point size temporarily • Don’t ever use red on blue font • See what I mean? It’s more difficult to read • Show how to move and fix windows WCSUG Presentation
In the Classroom (2) • Optimizing visibility with projector • Use rich color background • EditPreferencesGeneral preferences. Blue background option good but it relies on red for errors, green for Standard text, and doesn’t bold fonts. • Custom may be better because you can make fonts bold and pick colors that do not disadvantage students who are colorblind. WCSUG Presentation
Virtual Lab • A server supporting 30 simultaneous sessions of Stata is remarkably inexpensive. • A department can require students to have laptops or provide a cart with enough laptops • Because laptops are really “dumb” terminals with server, the laptops can be cheap and not updated very often • Any room becomes a lab • Students should have 24/7 access to the server WCSUG Presentation
Handouts and Data Sets • Have handouts of your lecture notes • Have handouts of your data analysis demonstrations • Include commands as well as output! • Data sets • On line – LAN or CD or Floppy disk --Lots of laptops don’t have floppy drives any more, flash drives are inexpensive • Include • Student generated datasets • Datasets with large Ns and relatively few variables WCSUG Presentation
Emphasis in Course • Lectures devoted to statistics • Labs to learning Stata and working on homework and discussion • Proper printing of output • Don’t split output between two pages if possible (at least, find a good break point) • Always use a monotype font (such as Courier New) WCSUG Presentation
Some Final Issues • Multiple testing can distort inference (i.e. doing 100 tests guarantees some significant results – but they may be meaningless) – Worry about this • Controlling the digits in the output. Use outreg, estout, esttab WCSUG Presentation
The End WCSUG Presentation