100 likes | 218 Views
Labor Economics Exercise session # 1 Artificial D ata G eneration. TA: Natalia Shestakova October, 2007. Overview. Generating random variables Graphing Throwing seeds Generating random dummy variables from sample Drawing from multivariate distributions
E N D
Labor EconomicsExercise session # 1ArtificialData Generation TA: Natalia ShestakovaOctober, 2007
Overview • Generating random variables • Graphing • Throwing seeds • Generating random dummy variables from sample • Drawing from multivariate distributions • Loops and distribution of estimated coefficients
Generating random variables-1 Random-number functions: • uniform() returns uniformly distributed pseudorandom numbers on the interval [0,1). uniform() takes no arguments, but the parentheses must be typed. • invnormal(uniform()) returns normally distributed random numbers with mean 0 and standard deviation 1. Reminder: • Discrete uniform distribution: all values of a finite set of possible values are equally probable, continuous: all intervals of the same length are equally probable • Normal distribution: family of continuous probability distributions. Each member of the family may be defined by two parameters, location and scale: the mean ("average") and standard deviation ("variability"), respectively
Generating random variables-2 Examples: 500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) 500 draws from the uniform distribution between 3 and 12 gen x4 = 3 + 9*uniform() 500 observations of the variable that is a linear combination of other variables gen z = 4 - 3*x4 + 8*x2
Throwing seeds => Allows you to generate a particular sample anytime again: set obs 500 set seed 2 gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed 19840607 gen z3 = invnorm(uniform()) dotplot z1 z2 z3
Generating random dummy variables from sample Task: generate a variable that characterizes whether an individual smokes (smoke=1) or does not (smoke=0) smoke. (a) for period 1, assume that (s)he smokes with probability 30%, (b) for each of the following 30 periods, there is a 65% chance that a smoker keeps smoking and a 5% chance that a non-smoker starts smoking Solution: • Note, that a uniformly distributed at [0,1) variable is less than 0.3 with 30% chance. Then: gen smoke = uniform()<.3 • first, for every individual, give her/him an ID and create observations for 30 years (they will be the same); then, step by step, update probabilities to smoke in every year for every ID: by pid: replace smoke=uniform()<(.05+.6*smoke[_n-1]) if _n>1
Drawing from multivariate distributions Task: generate a number of variables that are correlated with each other (have multivariate distribution) Solution: (a) drawnorm: draws a sample from a multivariate normal distribution with desired means and covariance matrix drawnorm x y, n(1000) means(m) corr(C) (b) corr2data: creates an artificial dataset with a specified correlation structure (is not a sample from an underlying population with the summary statistics specified) corr2data x y, n(1000) means(m) corr(C) Note: matrices m and C can be specified using mat
Loops and distribution of estimated coefficients Why to use loops? -> low probability that one randomly drawn sample coincides with the real one -> drawing more samples for estimating a coefficient of interest and taking the average of these coefficients makes the estimate closer to the real one How to use loops? gen b1=0 /* all observations of b1 are assigned 0 value local i=1 /* i is a counter variable in the following loop set more off /* useful command so we do not have to hit enter every time the regression runs while `i'<=500 { /* command to start a loop of 500 repeatitions drop _all /* drop all specified observations so we can randomly generate them again /*generate random variables /*regression scalar d =_b[x1] /* store the output of regression into a variable replace b1 = scalar(d) if _n==`i‘ /* put the estimated coefficient in the ith regression into ith observation of variable b1 local i=`i'+1 /* adds 1 to the counter } /*end of the loop