Topics

Topics • Introduction to Stata • Files / directories • Stata syntax • Useful commands / functions • Logistic regression analysis with Stata • Estimation • GOF • Coefficients • Checking assumptions

Introduction to Stata • Note: we did this interactively for the larger part …

Stata file types • .ado • programs that add commands to Stata • .do • Batch files that execute a set of Stata commands • .dta • Data file in Stata’s format • .log • Output saved as plain text by thelog using command

The working directory • The working directory is the default directory for any file operations such as using & saving data, or logging output • cd “d:\my work\”

Saving output to log files • Syntax for the log command • log using filename [, append replace [smcl|text]] • To close a log file • log close

Using and saving datasets • Load a Stata dataset • use d:\myproject\data.dta, clear • Save • save d:\myproject\data, replace • Using change directory • cd d:\myproject • Use data, clear • save data, replace

Entering data • Data in other formats • You can use SPSS to convert data • You can use the infile and insheet commands to import data in ASCII format • Entering data by hand • Type edit or just click on the data-editor button

Do-files • You can create a text file that contains a series of commands • Use the do-editor to work with do-files • Example I

Adding comments • // or * denote comments stata should ignore • Stata ignores whatever follows after /// and treats the next line as a continuation • Example II

A recommended structure capture log close //if a log file is open, close it, otherwise disregard set more off //dont'pause when output scrolls off the page cd d:\myproject //change directory to your working directory log using myfile, replace text //log results to file myfile.log … here you put the rest of your Stata commands … log close //close the log file

Serious data analysis • Ensure replicability use do+log files • Document your do-files • What is obvious today, is baffling in six months • Keep a research log • Diary that includes a description of every program you run • Develop a system for naming files

Serious data analysis • New variables should be given new names • Use labels and notes • Double check every new variable • ARCHIVE

Stata syntax examples

The Stata syntax • Regress y x1 x2 if x3 <20, cluster(x4) • Regress = Command • Whataction do you want to performed • y x1 x2 = Names of variables, files orotherobjects • Onwhatthings is the commandperformed • if x3 <20 = Qualifieronobservations • Onwhichobservationsshould the commandbeperformed • , cluster(x4) = Options • What special thingsshouldbedone in executing the command

Examples • tabulate smoking race if agemother > 30, row • Example of the if qualifier • sum agemother if smoking == 1 & weightmother < 100

Elements used for logical statements

Missing values • AutomaticallyexcludedwhenStata fits models; they are stored as the largestpositivevalues • Beware!! • The expression ‘age > 65’ canthusalsoinclude missing values • Tobesure type: ‘age > 65 & age != .’

Selecting observations • drop variable list • keepvariable list • drop ifage < 65

Creating new variables • generate command • generate age2 = age * age • generate • see help function • !!sometimes the command egen is a useful alternative, f.i. • egen meanage = mean(age)

Useful functions

Replace command • replace has the same syntax as generate but is used to change values of a variable that already exists • gen age_dum = . • replace age = 0 if age < 5 • replace age = 1 if age >=5

Recode • Change values of exisiting variables • Change 1 to 2 and 3 to 4: recode origvar (1=2)(3=4), gen(myvar1) • Change missings to 1: recode origvar (.=1), gen(origvar)

Logistic Logistic regression

Logistic regression • Lets use a set of data collected by the state of California from 1200 high schools measuring academic achievement. • Our dependent variable is called hiqual. • Our predictor variable will be a continuous variable called avg_ed, which is a continuous measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

OLS in Stata

Logistic regression in Stata

Multiple predictors

MODEL FIT Consider model fit using: The likelihood ratio test The pseudo-R2 (proportional change in log-likelihood) The classification table

Model fit: the likelihood ratio test

Model fit: LR test

Pseudo R2: proportionalchange in LL

Classification Table

Interpreting coefficients

Interpreting coefficients: significance

Interpretation of coefficients: direction

Interpretation of coefficients: Magnitude

Ok now

Multicollinearity

Influential observations

To do • Perform a logisticregression analysis • Use apilog.dta • Awards = dependentvariable

Topics

Topics

Presentation Transcript

TOPICS

Topics

Topics

Topics

Topics

Topics

Topics:

Topics

Topics

Topics

Topics

Topics

TOPICS

Topics

Topics:

Topics

Topics

Topics

Topics