Topics

Topics • IntroductiontoStata • Files / directories • Stata syntax • Usefulcommands / functions • Logisticregression analysis withStata • Estimation • Goodness Of Fit • Coefficients • Checkingassumptions

Introduction to Stata • Note: we did this interactively for the larger part …

Stata file types • .ado • programs that add commands to Stata • .do • Batch files that execute a set of Stata commands • .dta • Data file in Stata’s format • .log • Output saved as plain text by thelog using command

The working directory • The working directory is the default directory forany file operations such as using & saving data, or logging output cd “d:\mywork\”

Saving output to log files • Syntax for the log command log using[filename], replacetext • Toclose a log file log close

Using and saving datasets • Load a Stata dataset use d:\myproject\data.dta, clear • Save save d:\myproject\data, replace • Using change directory cd d:\myproject usedata, clear save data, replace

Entering data • Data in other formats • Youcanuse SPSS toconvertdata (read in or save as a data file in another format, forinstanceStata’s .dta format) • Youcanuse the infileandinsheetcommandsto import data in ASCII format • Entering data by hand • Type editor just click on the data-editor button

Do-files • Youcancreate a text file thatcontains a series of commands. It is the equivalent of SPSS syntax (but way easiertomemorize) • Usethe do-file editor toworkwith do-files

Addingcomments in do-files • // or * denotecommentsstatashouldignore • Stataignoreswhateverfollowsafter /// andtreats the next line as a continuation • ExampleII

A recommendedtemplate for do-files capture log close //if a log file is open, close it, otherwise disregard set more off //dont'pause when output scrolls off the page cd d:\myproject//change directory to your working directory log using myfile, replace text //log results to file myfile.log … here you put the rest of your Stata commands … log close //close the log file

Serious data analysis • Ensure replicability use do+log files • Document your do-files • What is obvious today, is baffling in six months • Keep a research log • Diary that includes a description of every program you run • Develop a system for naming files

Serious data analysis • New variables shouldbegiven new names • Usevariablelabelsandnotes • Double check every new variable • ARCHIVE

Stata syntax examples

Stata syntax example regress y x1 x2 if x3<20, cluster(x4) • regress = command • Whataction do you want to performed • y x1 x2 = Names of variables, files orotherobjects • Onwhatthings is the commandperformed • if x3 <20 = Qualifieronobservations • Onwhichobservationsshould the commandbeperformed • , cluster(x4) = Options • What special thingsshouldbedone in executing the command

More examples tabulate smoking race if agemother>30, row More elaborateif-statements: sumagemother if smoking==1 & weightmother<100

Elements used for logical statements

Missing values • AutomaticallyexcludedwhenStata fits models (same as in SPSS); they are stored as the largestpositivevalues • Beware!! • The expression“age>65” canthusalsoinclude missing values (these are alsolargerthan 65) • Tobesure type: “age>65 & age!=.”

Selecting observations drop [variable list] keep[variable list] drop ifage<65 Note: they are thengoneforever. This is notSPSS’s [filter] command.

Creating new variables Generating new variables generateage2 = age*age (for more complicatedfunctions, therealsoexists a command “egen”, as we willsee later)

Useful functions

Replace command • replace has the same syntax as generate but is usedto change values of a variablethatalreadyexists gen age_dum= . replaceage_dum= 0 ifage < 5 replaceage_dum = 1 ifage >=5

Recode • Change values of existingvariables • Change 1 to 2 and 3 to4 in origvar, and call the new variable myvar1: recodeorigvar (1=2)(3=4), gen(myvar1) • Change 1’s tomissings in origvar, and call the new variable myvar2: recodeorigvar(1=.), gen(myvar2)

Logistic Logistic regression

Logistic regression • We use a set of data collected by the state of California from 1200 high schools measuring academic achievement. • Our dependent variable is called hiqual. • Our predictor variable will be a continuous variable called avg_ed, which is a measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

OLS in Stata

Logistic regression in Stata

Multiple predictors

MODEL FIT Consider model fit using: The likelihood ratio test The pseudo-R2 (proportional change in log-likelihood) The classification table

Model fit: the likelihood ratio test

Model fit: LR test

Pseudo R2: proportionalchange in LL

A second measure of fit: the classificationTable

Classificationtablefor the model with the predictors

Interpreting coefficients

Interpreting coefficients: significance -16.29 = -12.05/0.74

Interpretation of coefficients: direction

Interpretation of coefficients: magnitude

Interpretation of coefficients: Magnitude

Assumptions and outliers

The link test (sort equivalent tolinearityassumption in MR)

Multicollinearity (here we cheat a little)

Influentialobservations: check the residuals

Have a closer look at the outlierresidual

And this helps a little (but not much)

Assumptions (continued):The model should fit equally well everywhere

Goodness of fit:Hosmer & Lemeshow Average Probability In j th group

Topics

Topics

Presentation Transcript

TOPICS

Topics

Topics

Topics

Topics

Topics

Topics:

Topics

Topics

Topics

Topics

Topics

TOPICS

Topics

Topics:

Topics

Topics

Topics

Topics