370 likes | 598 Views
Regression Lab 2. All About Procs Proc print Proc Sort Proc means Proc univariate Proc plot. Proc Print. Provides: Printout of raw data set Screen data for errors Qualitative & quantitative data Some or all variables Good if data are not separated initially. Format. PROC PRINT
E N D
Regression Lab 2 • All About Procs • Proc print • Proc Sort • Proc means • Proc univariate • Proc plot
Proc Print • Provides: • Printout of raw data set • Screen data for errors • Qualitative & quantitative data • Some or all variables • Good if data are not separated initially
Format PROC PRINT DATA=dataset; VAR variable1 variable2;
Proc Sort • Provides sorting by your choice of variable(s) • Necessary before matching or BY statement • Default is ascending, you must specify if you want descending. • Example: PROC SORTDATA=dataset OUT=newdataset; BY descending variable1 ascending variable2 ;
Proc Sort Example data k1; input a1; cards; 4 5 6 7 8 9 ; proc print; procsort data=k1 out=k12; by descending a1; procprint; run;
Proc Sort Example (cont) Obs a1 1 4 2 5 3 6 4 7 5 8 6 9 Obs a1 1 9 2 8 3 7 4 6 5 5 6 4
PROC MEANS • Calculates: • # of useful cases utilized • Mean • Standard deviation • Minimum value observed • Maximum value observed
Proc Means • Provides a compact table of means • Allows data integrity checks • # non-missing values • minimum and maximum values • PROC MEANS DATA=dataset; VAR variable;
Proc Means Example Options nodate linesize=64; data k1; input a1; cards; 4 5 6 7 8 9 ; procmeans data=k1; var a1; run;
Proc means example cont. The SAS System 7 The MEANS Procedure Analysis Variable : a1 N Mean Std Dev Minimum Maximum 6 6.5000000 1.8708287 4.0000000 9.0000000
PROC UNIVARIATE • Options: • Tests for normality (if requested) • Stem-and-leaf plots • Box plots • Descriptives (mean, SD, etc.) PROC UNIVARIATEDATA=dataset NORMALPLOT; VAR variable1 variable2; BY variable-list;
Proc Univariate Example options nodate linesize=64; data k1; input a1; cards; 4 5 6 6 7 11 ; procunivariate data=k1 plot; var a1; run;
Proc Univariate Output Basic Statistical Measures Location Variability Mean 6.500000 Std Deviation 2.42899 Median 6.000000 Variance 5.90000 Mode 6.000000 Range 7.00000 Interquartile Range 2.00000
Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 4 1 5 2 5 2 6 3 6 4 6 4 6 3 7 5 7 5 11 6 Stem Leaf # Boxplot 11 0 1 0 10 9 8 7 0 1 +-----+ 6 00 2 *--+--* 5 0 1 +-----+ 4 0 1 | ----+----+----+----+ Proc Univariate Output
PROC UNIVARIATE Output • Moments table: • Mean, SD, variance, skewness, kurtosis, normality test, and more • Quartiles table: • Mode, median, 25th & 75TH percentile • Stem-and-leaf plot (mean:+, median:*--*) • Box plot • Probability plot
Proc Univariate – By function options nodate linesize=64; data k1; input a1 sex; cards; 4 0 5 0 6 0 6 1 7 1 11 1 ; procunivariate data=k1 plot; var a1 sex; by sex; run;
Proc Univariate Output | 11 + +-----+ | | | | | | | | | 10 + | | | | | | | | | | | 9 + | | | | | | | | | | | 8 + | + | | | | | | | | | | 7 + *-----* | | | | | | | | | 6 + +-----+ +-----+ | | | | | | | | | 5 + *--+--* | | | | | | | | | 4 + +-----+ ------------+-----------+----------- sex 0 1
Proc Frequency options nodate linesize=64; data k1; input a1 sex; cards; 4 0 5 0 6 0 6 1 7 1 11 1 ; procfreq data=k1; tables a1 sex; run;
Proc Freq Output The FREQ Procedure Cumulative Cumulative a1 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 4 1 16.67 1 16.67 5 1 16.67 2 33.33 6 2 33.33 4 66.67 7 1 16.67 5 83.33 11 1 16.67 6 100.00 Cumulative Cumulative sex Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 3 50.00 3 50.00 1 3 50.00 6 100.00
PROC PLOT • Creates: • Scatter plot • Criterion: vertical axis • Predictor: horizontal axis PROC PLOT DATA=dataset; PLOT variable1*variable2; variable1 will be on the vertical axis
Example proc plot options nodate linesize=64 pagesize=100; data k1; input a b; cards; 4 5 5 6 6 7 6 7 7 8 11 2 ; procplot; plot b*a; run;
Output Proc Plot • The SAS System 14 • Plot of b*a. Legend: A = 1 obs, B = 2 obs, etc. • b ‚ • ‚ • 8 ˆ A • 7 ˆ B • 6 ˆ A • 5 ˆ A • 4 ˆ • 3 ˆ • 2 ˆ A • ‚ • Šƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒ • 4 5 6 7 8 9 10 11 • a
Proc GPLOT • Produces nicer graphs • Graphs can be copied into other documents. PROC GPLOT DATA=dataset; PLOT a*b b*a /overlay; • Overlay option causes output from two or more plot statements to be placed on a single graph.
Example GPLOT data k1; input a b; cards; 4 5 5 6 6 7 6 7 7 8 11 2 ; procgplot; plot b*a a*b/overlay; run;
Set Command • Used to combine datasets • Combines vertically (stack), D1 D2 D3 • In the data statement, you give a name to the new dataset after they have been stacked. • Combing subjects with the same variables. DATA dataset; SET D1 D2 D3;
data a1; input a; cards; 4 5 6 ; data a2; input a; cards; 7 8 9 ; data a3; input a; cards; 10 11 12 ; data all; set a1 a2 a3; procprint; run; Example: Data set
Proc Set Output The SAS System 45 Obs a 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9 12
Example: Data Set cont. • Try changing the name of the variable in dataset a1 to “b” instead of “a.” data a1; Input b; cards; 4 5 6 ;
Proc Set Output The SAS System 46 Obs b a 1 4 . 2 5 . 3 6 . 4 . 7 5 . 8 6 . 9 7 . 10 8 . 11 9 . 12
Merge statement • Used to combine datasets horizontally (add variables) • Matches cases on specified variable • Produces missing values for cases not matched DATA NEW; MERGE D1 D2; BY SSN;
data a2; input idnum c; cards; 3 7 2 8 1 9 ; data a3; input idnum b; cards; 1 10 2 11 3 12 ; procsort data = a2; by idnum; procsort data = a3; by idnum; data all; merge a2 a3; by idnum; procprint; run; Merge data Example
Merge Output The SAS System 48 Obs idnum c b 1 1 9 10 2 2 8 11 3 3 7 12
Delete Statement • EQ (=): equal to • NE (~=): not equal to • GT (>): greater than • LT (<): less than • GE (>=): greater than or equal to • LE (<=): less than or equal to
Delete Statement • If _n_ = 2 then delete; • If a >= 3 then delete; • If b = . then delete;
Delete Statement data a3; input idnum b; If _n_ = 2 then delete; cards; 1 10 2 11 3 12 ; procprint; run;
Delete output The SAS System 49 Obs idnum b 1 1 10 2 3 12