1 / 23

WRDS SAS User Guide

WRDS SAS User Guide . West Virginia University. Available Data in WRDS . Advantage for using SAS. WRDS is built using SAS data sets, so manipulating data through SAS is easier than almost any other querying tool. Any combination of two databases can be constructed.

tyme
Download Presentation

WRDS SAS User Guide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WRDS SAS User Guide West Virginia University

  2. Available Data in WRDS

  3. Advantage for using SAS • WRDS is built using SAS data sets, so manipulating data through SAS is easier than almost any other querying tool. • Any combination of two databases can be constructed. • The web interface deletes observations for which the chosen variables have missing values and there is no simple way of finding out what observations were deleted

  4. SAS Sample programs • For simple SAS codes: ► SAS Sample Programs on WRDS Info Home • For advanced SAS codes: ►Support → WRDS Datasets and Sample program → SAS ►Support → Research Applications

  5. Connect to WRDS %let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=wrds; signon username=_prompt_; rsubmit; *------------------- * * your code here *------------------------- *; endrsubmit; ► Note: you always need this code since it has SAS connected to WRDS.

  6. Autoexe.sas data _NULL_; file 'autoexec.sas'; put "%include '!SASROOT/wrdslib.sas';"; run; ► A list of important libnames already is assigned by WRDS through this statement. ► You may run this code only when libname error happens.

  7. Libname • SAS library names are already defined in all user accounts of Unix. • For example, ► Bank regulatory bank: /wrds/bank/sasdata ► Compustat: comp /wrds/compustat/sasdata ► CRSP CCM: crsp /wrds/crsp/sasdata/cc ► CRSP Monthly stock: crsp /wrds/crsp/sasdata/sm • Unix home directory: ► Temp /home/wvu/min06/temp

  8. Data set • To set up data steps, use the LIBNAME statement and then name the dataset. This is enough to create it. Example: • For CRSP monthly stock file: ► set crsp.msf • For Compustat Industrial Fundamental file: ► set comp.ina • All data set are from SAS data files stored in Unix. ► A good way to fix your error is checking variables and directory name ofSAS files in Unix.

  9. Finding variables • Web – Based: ► Documents ► Tools (Searching variables) • Using SAS: ►proc contents data=crsp.dsf;

  10. Finding identifiers • Web – Based: ► Code lookup ► Tools • Using SAS: ► For example, to find identifiers in Compustat; data names; set comp.namesann; where coname contains 'IBM' or SMBL contains 'IBM'; run; proc print data = names; run; ► For CRSP: the file name is “stocknames”. • Using Unix command: grep

  11. Merge CRSP / Compustat using CUSIP When merging two databases, we need a common ID • Best way is to match them with CUSIP: ► Names and Tickers are problematic since they change though time, can be re-used. and therefore have different entries in different databases. ► CUSIP changes through time but are not re-used. ► should be historical one (NCUSIP)

  12. Compustat: ► CNUM=459200 (6 digits of CUSIP) CRSP: ► CUSIP=45920010 Understanding CUSIP 459200101 Example for IBM

  13. ► To create a common identifier (cnum), we use CUSIP and subtract 6 digits from it. Matching identifiers

  14. Step 1. Headers from Compustat • From Compustat header file “namesann”: ► Find “cnum” and “gvkey” for IBM ► Then exclude missing data ► Sort data by cnum • procsort data=comp.namesann(keep = gvkey cnum) out=comp nodupkey; where missing(cnum)=0 where smbl in IBM; by cnum gvkey; run;

  15. Step 2. Headers from CRSP • From CRSP header file “stocknames”; ► Find “ncusip” and “permno” for IBM ► Then exclude missing data ► Sort data by ncusip ► Define the output “mse” • procsort data=crsp.stocknames(keep = permco ncusip) out=mse nodupkey; where missing(ncusip)=0; by permco ncusip; run;

  16. Step 3. Creating cnum from ncusip • Create 6 digits identifier (cnum) from ncusip in order CRSP and Compustat to be matched on cnum: ► Using functions “length” and “subtr”, create cnum from ncusip in “mse” ► Sort data by cnum ► Define the output as “mse3” data mse2; length cnum $6.; set crsp.mse; cnum= substr(ncusip,1,6); run; procsort data=mse2 out=mse3(keep = permco cnum) nodupkey; by cnum permco; run;

  17. Step 4. Merging ► Create temporary variables “aa” and “bb” using option “in” in order to track whether that data set contributed to the current observation data joint2; merge comp(in=aa) mse3(in=bb); by cnum; /* Create Dummies to test source of merging*/ if aa=1 then compustat=1; else compustat=0; if bb=1 then crsp=1; else crsp=0; run;

  18. b. Extract data from CCM Concepts needed: • Historical identifier (NPERMNO) • Linking file (cstlink2) ► see CCM guide • Most of the SAS procedures on WRDS use SQL ►see SQL references

  19. Libname: ‘wrds/crsp/sasdata/cc’ Sepecify beginning and ending years: %let beg_yr = 1995; %let end_yr = 2003; Step 1. Libname & Years Step 1 BEGFYR ENDFYR

  20. Step 2. Link file (cstlink2) Specify link information: ► Create a data (temp1) which is set on “cstlink2” /* data temp1; set crsp.CSTLINK2; run; */ ► Select link types and link dates (1995 < date < 2004) ► Sort data by GVKEY ► Define the output as “link” /* procsort data=temp1 out=lnk; where NPERMNO in (12490, 11081, 10107) and LINKTYPE in ("LU", "LC", "LD", "LF", "LN", "LO", "LS", "LX") and (&end_yr+1 >= year(LINKDT) or LINKDT = .B) and (&beg_yr-1 <= year(LINKENDDT) or LINKENDDT = .E); by GVKEY LINKDT; run; */ Step 2 LINKENDDT LINKDT BEGFYR ENDFYR

  21. A,B part →(LINKDT <= FYENDDT or LINKDT = .B) and (FYENDDT <= LINKENDDT or LINKENDDT =.E) B part →(LINKDT <= FYBEGDT or LINKDT = .B) and (LINKENDDT >= FYENDDT or LINKENDDT = .E) A,B,C part→(LINKDT <= FYENDDT or LINKDT = .B) and (LINKENDDT >= FYBEGDT or LINKENDDT = .E) Step 3.a LINKDT LINKENDDT BEGFYR ENDFYR BEGFYR ENDFYR A B C Step 3

  22. Step 3b. Specify overlapping periods ► Create table (defined “mydata”) which has following variables from the file “link” ► Name “crsp.CSTANN” as “cst” (* CSTANN is a file which contains all compustat data) ► Specify date requirements. (Select A, B or C) ► With GVEKY we found, extract data we need from “CSTANN“ by the corresponding GVKEY (lnk.GVKEY = cst.GVKEY) proc sql; create table mydata(keep=GVKEY NPERMNO NPERMCO SMBL YEARA LINKDT LINKENDDT LINKTYPE DATA6) as select * from lnk, crsp.CSTANN as cst where lnk.GVKEY = cst.GVKEY and (&beg_yr <= YEARA <= &end_yr) and (LINKDT <= cst.FYENDDT or LINKDT = .B) and (cst.FYENDDT <= LINKENDDT or LINKENDDT = .E); quit;

  23. References To see all references and SAS programs: ► http://www.be.wvu.edu/wrds/home/index.html

More Related