200 likes | 333 Views
WRDS Workshop. Part 2 UNC – Charlotte Department of Accounting. By : Keejae Hong. COMPUSTAT. Compustat changed variable names (#18 -> ib ) and database structure a few years ago. Tables , funda or fundq (old format compann ( compqtr ,) Industrial Annual
E N D
WRDS Workshop Part 2 UNC – Charlotte Department of Accounting By: Keejae Hong
COMPUSTAT • Compustat changed variable names (#18 -> ib) and database structure a few years ago. • Tables, fundaor fundq (old format compann(compqtr,) • Industrial Annual • NYE, American Stock Exchange, S&P500, utility subsidiaries • Primary, Supplementary and Tertiary (NO distinction in WRDS) • Full Coverage Annual • NASDAQ companies, firms listed in regional exchanges • Backdata (20 year + 20 years Industrial and Full coverage ) • Research • We don’t have to worry about this in WRDS.
COMPUSTAT • Unique Identifier • Gvkey • CUSIP is not an identifier • Annual Compustat (Gvkeyyeara) (gvkeydatadate) + others ( if indfmt='INDL'; • if datafmt='STD'; • if POPSRC='D'; • if Consol='C'; • Quarterly Compustat (Gvkeyyear qtr) (gvkeydatadate) + others • Delete duplicates to begin with… • CUSIP available now: Cnum (6digit) + first 2 digits of CIC(3digit) = CUSIP (8digit number) • CUSIPstands for Committee on Uniform Securities Identification Procedures. A CUSIP number identifies most securities. • When there is a change in Cnum (or cusip), compustat changes all previous years’cnum retrospectively. this may create a problem when you combine the data with CRSP or IBES Per-share-basis numbers (e.g. EPS ) are reported as shown on the current financial statements. In other words, they are unadjusted numbers. Use adjustment factor (#27 ajx, annual compustat) to adjust number. • Firms with fiscal year end January through May have yeara +1 will give the calendar year. For example, • CONAME FYR YEARA • AAA 12 1994 199401 - 199412 • BBB 9 1994 199310 - 199409 • CCC 5 1994 199406 - 199505
CRSP • Many different Tables (mse, msf, and etc.) • Unique Identifier • PERMNO • CUSIP or NCUSIP is not an identifier • Use CUSIP(current cusip) to merge crsp data with compustat • Use NCUSIP (historical cusip) to merge crsp data with IBES • Web Interface option often gives you limited choice. Be prepared to use crsp tables directly when needed.
IBES • Unique Identifier • IBES Ticker (not the same as the stock market ticker, which is referred as “official ticker”) • CUSIP in IBES is a historical CUSIP. • Forecast numbers are already split-adjusted. BE VERY CAREFUL when you merge IBES and COMPUSTAT data. • You don’t want to merge 1993 IBES with 1995 IBES. • Unadjusted numbers are available at request • SAS code available through wrds
WRDS Data Structure • Relational Database • Stores data in a structure consisting of one or more tables of rows and columns, which may be interconnected. • A row corresponds to a record; columns correspond to attributes (fields) in the record. • Each database (i.e., COMPUSTAT or CRSP) consists of many tables. • Each table consists of many rows and columns.
SQL (Structured Query Language) • The advantage of using SQL • Combined functionality • Faster for smaller tables • Not require presorting • Not require common variable names to join on (need same type, length)
1. Creating a new variable of grand mean proc sql; create table kids1 as select *, mean(age) as mean_age from kids; quit; proc print data=kids1 noobs; run;
2. Creating a new variable of group mean proc sql; create table kids2 as select *, mean(age) as mean_age from kids group by famid; quit;
3. Creating multiple variables of summary statistics at once proc sql; create table kids3 as select famid, mean(age) as mean_age , std(age) as std_age, mean(wt) as mean_wt, std(wt) as std_wt from kids group by famid; quit;
4. Creating multiple summary statistics variables in the original data set proc sql; create table fam5 as select *, mean(age) as mean_age, std(age) as std_age, mean(wt) as mean_wt, std(wt) as std_wt from kids group by famid order by famid, kidname desc; quit;
Proc SQL : one-to-one merge proc sql; create table dadfam1 as select * from dads,faminc where dads.famid=faminc.famid order by dads.famid; quit;
Proc SQL : one-to-many merge proc sql; create table dadkid2 as select * from dads, kids where dads.fid=kids.famid order by dads.fid, kids.kidname; quit;
Using full join to handle mismatching records in a one-to-one merge proc sql; create table dadkid3 as select * from dads, faminc where dads.famid=faminc.famid order by dads.famid;
Using full join to handle mismatching records in a one-to-one merge proc sql; create table dadkid4 as select *, (dads.famid=faminc.famid) as indic, (dads.famid ~=.) as dadind, (faminc.famid ~=.) as famind, coalesce(dads.famid, faminc.famid) as fid from dads full join faminc on dads.famid=faminc.famid; quit;
Producing all the possible distinct pairs of the values in a column proc sql; create table discity as select distinct city from places; quit;
All the possible distinct pairs of the values in a column (con’t) proc sql; create table pair_places as select f1.city as orig , f2.city as dest from discity as f1 , discity as f2 where f1.city ne ' ' & f1.city < f2.city order by f1.city, f2.city; quit;
Source • http://www.ats.ucla.edu/stat/SAS/modules/sqlmerge.htm • SAS Reference books • The little SAS book by Delwiche and Slaughter • Using SAS in Financial Research by Boehmer, Broussard, Kallunki • http://belkcollegeofbusiness.uncc.edu/khong5/workshop_sch.htm