100 likes | 289 Views
Benefits. Access native CRSPAccess Databases.Most SAS Dataset options work without modification (including keep, rename, and drop).Native SAS datasets and SASECRSP dataset can be used in the same statements.Procedures can directly access SASECRSP datasets.Supports random access on PERMNO.Suppor
E N D
1. The New SAS Engine For CRSP Benefits of using SAS engine for CRSP
Setting up the interface between SAS and CRSP
Examples of use
Performance Benchmarks
2. Benefits Access native CRSPAccess Databases.
Most SAS Dataset options work without modification (including keep, rename, and drop).
Native SAS datasets and SASECRSP dataset can be used in the same statements.
Procedures can directly access SASECRSP datasets.
Supports random access on PERMNO.
Supports global & PERMNO specific date restrictions.
Supports functions and formats specific to CRSP trading calendars.
Often requires no intermediate files.
Saving coding time, CPU time, I/O, & Disk space.
3. Setting up the Interface After CRSP Stock Data and SAS/ETSTM 8.2 are installed independently. Two notes on installation -
CRSPDB_SASCAL environment variable must be set to you latest CRSP data.
SASECRSP is currently only available for Windows and HP/UX.
After that, the interface can be accessed using simple extensions to the SAS libname statement. For instance,
To access CRSP daily or monthly sample database respectively
libname dstk sasecrsp ‘c:\dasample\' setid=10;libname mstk sasecrsp ‘c:\masample\' setid=20;
The data files being accessed are the same ones that are accessed by FORTRAN, ‘C’, ts_print, or stk_print. Therefore disk space is only needed for one installation regardless of how users access the data. This saves not only disk space, but labor each time new data is shipped.
4. Printing HeadersExample 1: Use of keep statement, and direct use of the proc print procedure proc print data = mstk.stkhead (keep = permno hcomnam begdt enddt);
Obs PERMNO BEGDT ENDDT HCOMNAM
1 10107 19860331 20010731 MICROSOFT CORP 2 10823 19251231 19980529 KEYSPAN ENERGY CORP 3 12079 19251231 20010731 GENERAL MOTORS CORP 4 12490 19251231 20010731 INTERNATIONAL BUSINESS MACHS COR 5 14218 19251231 19890428 R J R NABISCO INC 6 14322 19251231 20010731 SEARS ROEBUCK & CO 7 15069 19251231 20010731 U S X MARATHON GROUP 8 18948 19300430 19961231 ROCKWELL INTERNATIONAL CORP 9 24360 19500731 20010731 KEYSPAN CORP10 28353 19620131 20010731 RALSTON PURINA GROUP11 52070 19710331 19841031 ELECTRONIC DATA SYS CORP12 66931 19841031 19960628 GENERAL MOTORS CORP13 68451 19851231 20010731 GENERAL MOTORS CORP14 75228 19850830 20010731 IVAX CORP15 75390 19881031 19990831 DEVON ENERGY CORP16 76597 19910328 20001229 NABISCO GROUP HOLDING CORP17 78946 19930226 19970530 DEAN WITTER DISCOVER & CO18 79323 19930630 20010731 ALLSTATE CORP19 79348 19930831 19950531 RALSTON CONTINENTAL BAKING GROUP20 81191 19950131 20001229 NABISCO HOLDINGS CORP21 83596 19960628 20010731 ELECTRONIC DATA SYSTEMS CORP22 84381 19961231 20010731 ROCKWELL INTERNATIONAL CORP NEW23 85349 19971031 20010731 ARVINMERITOR INC24 86496 19990129 20010731 CONEXANT SYSTEMS INC25 87137 19990831 20010731 DEVON ENERGY CORP NEW
5. Average Returns Example 2: Demonstrates date restriction on a libname, direct use of proc means, and output of native SAS datasets It is easy to restrict data to a specified date range before passing it to SAS for calculation.
To Calculate Average monthly returns for all the securities in the sample database for the period of January, 1996 – December, 2000, use the following commands.
libname mth5yr sasecrsp 'c:\masample\' range='19960131-20001229' setid=20;
proc means data=mth5yr.ret noprint; var ret; by permno; output mean=avgret out=avgret; run;
6. Works with Proc SQLExample 3: Demonstrates SASECRSP and native SAS dataset in the same statement. /* Join with the company names and restrict to securities with exactly five years (60 months) of non-missing returns, and sort output by the avgret. */
proc sql;
select a.permno, a.hcomnam, b.avgret
from mth5yr.stkhead a, avgret b
where a.permno=b.permno order and b._FREQ_ = 60 by avgret;
run;
PERMNO Company Name Header Returns
__________________________________________________________________________
14322 SEARS ROEBUCK & CO 0.004601
12079 GENERAL MOTORS CORP 0.010279
28353 RALSTON PURINA GROUP 0.013447
15069 U S X MARATHON GROUP 0.013855
79323 ALLSTATE CORP 0.018369
68451 GENERAL MOTORS CORP 0.022518
24360 KEYSPAN CORP 0.024243
75228 IVAX CORP 0.026936
12490 INTERNATIONAL BUSINESS MACHS COR 0.027325
10107 MICROSOFT CORP 0.031976
7. Performance MetricsExample 4: The SASECRSP engines performs at near CRSPAccess ‘C’ native speed. This was run on the full December 2000 daily database on a local drive of Pentium III 733 MHz running Windows 2000. This example reads over 50 million return observations, and calculates the minimum, maximum, mean, and standard deviation in approximately 2 ˝ minutes. libname dlyall sasecrsp 'c:\da200012\' setid=10;
proc means data=dlyall.ret;
var ret;
run;
N Mean Std Dev Minimum Maximum
___________________________________________________________________
51398971 0.000845599 0.0442346 -0.9716981 12.7777777
___________________________________________________________________
(Reads 205 MB and over 51 million observations in 2 minutes and 34 seconds)
WARNING: Defaulting to selecting all PERMNOs in CRSP database.
NOTE: There were 52365201 observations read from the dataset DLYALL.RET.
NOTE: PROCEDURE MEANS used:
real time 2:34.78
cpu time 2:31.12
8. Performance MetricsExample 5: This example demonstrates that random access by PERMNO is very efficient. Accessing the same underlying 51,000,000 observation dataset as the previous example, accessing just a single permno, takes just a faction of a second.
9. Performance MetricsExample 6: This example demonstrates how a full pass of the header data can be made to restrict the data and then used to create an input list with date ranges. This input list is then used to restrict access to the 51,000,000 observation dataset from the first performance example, and it access just the securities and dates requested. Creating the list of securities, takes less than two seconds. The restricted list takes a little more than ˝ minute. /* This example has two parts. First it does a subset to just companies that delist due to mergers and creates a dataset of PERMNO, delist date, and 200 trading days before the delist date. The second part on the next screen, then then subsets based on that universe of perms */
data mrgdt (keep=permno date1 date2);
set dlyall.stkhead;
where dlstcd >= 200 and dlstcd <= 299;
crspenddt = crspdicd(enddt,0); /* convert enddt to a CRSP calendar date */
date1 = crspdcid(crspenddt - 200); /* calculate 200 trading days prior to enddt and store in YYYYMMDD format */
if date1 ne . and date1 >= begdt; /* subsetting if - ignores companies that do not have 200 trading days of CRSP data available. */
date2 = enddt;
run;
NOTE: There were 7790 observations read from the dataset DLYALL.STKHEAD.
WHERE ((dlstcd>=200 and dlstcd<=299));
NOTE: The dataset WORK.MRGDT has 7640 observations and 3 variables.
NOTE: DATA statement used:
real time 1.21 seconds
cpu time 0.42 seconds