1 / 36

Regression Lab 2

Regression Lab 2. All About Procs Proc print Proc Sort Proc means Proc univariate Proc plot. Proc Print. Provides: Printout of raw data set Screen data for errors Qualitative & quantitative data Some or all variables Good if data are not separated initially. Format. PROC PRINT

tokala
Download Presentation

Regression Lab 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Lab 2 • All About Procs • Proc print • Proc Sort • Proc means • Proc univariate • Proc plot

  2. Proc Print • Provides: • Printout of raw data set • Screen data for errors • Qualitative & quantitative data • Some or all variables • Good if data are not separated initially

  3. Format PROC PRINT DATA=dataset; VAR variable1 variable2;

  4. Proc Sort • Provides sorting by your choice of variable(s) • Necessary before matching or BY statement • Default is ascending, you must specify if you want descending. • Example: PROC SORTDATA=dataset OUT=newdataset; BY descending variable1 ascending variable2 ;

  5. Proc Sort Example data k1; input a1; cards; 4 5 6 7 8 9 ; proc print; procsort data=k1 out=k12; by descending a1; procprint; run;

  6. Proc Sort Example (cont) Obs a1 1 4 2 5 3 6 4 7 5 8 6 9 Obs a1 1 9 2 8 3 7 4 6 5 5 6 4

  7. PROC MEANS • Calculates: • # of useful cases utilized • Mean • Standard deviation • Minimum value observed • Maximum value observed

  8. Proc Means • Provides a compact table of means • Allows data integrity checks • # non-missing values • minimum and maximum values • PROC MEANS DATA=dataset; VAR variable;

  9. Proc Means Example Options nodate linesize=64; data k1; input a1; cards; 4 5 6 7 8 9 ; procmeans data=k1; var a1; run;

  10. Proc means example cont. The SAS System 7 The MEANS Procedure Analysis Variable : a1 N Mean Std Dev Minimum Maximum 6 6.5000000 1.8708287 4.0000000 9.0000000

  11. PROC UNIVARIATE • Options: • Tests for normality (if requested) • Stem-and-leaf plots • Box plots • Descriptives (mean, SD, etc.) PROC UNIVARIATEDATA=dataset NORMALPLOT; VAR variable1 variable2; BY variable-list;

  12. Proc Univariate Example options nodate linesize=64; data k1; input a1; cards; 4 5 6 6 7 11 ; procunivariate data=k1 plot; var a1; run;

  13. Proc Univariate Output Basic Statistical Measures Location Variability Mean 6.500000 Std Deviation 2.42899 Median 6.000000 Variance 5.90000 Mode 6.000000 Range 7.00000 Interquartile Range 2.00000

  14. Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 4 1 5 2 5 2 6 3 6 4 6 4 6 3 7 5 7 5 11 6 Stem Leaf # Boxplot 11 0 1 0 10 9 8 7 0 1 +-----+ 6 00 2 *--+--* 5 0 1 +-----+ 4 0 1 | ----+----+----+----+ Proc Univariate Output

  15. PROC UNIVARIATE Output • Moments table: • Mean, SD, variance, skewness, kurtosis, normality test, and more • Quartiles table: • Mode, median, 25th & 75TH percentile • Stem-and-leaf plot (mean:+, median:*--*) • Box plot • Probability plot

  16. Proc Univariate – By function options nodate linesize=64; data k1; input a1 sex; cards; 4 0 5 0 6 0 6 1 7 1 11 1 ; procunivariate data=k1 plot; var a1 sex; by sex; run;

  17. Proc Univariate Output | 11 + +-----+ | | | | | | | | | 10 + | | | | | | | | | | | 9 + | | | | | | | | | | | 8 + | + | | | | | | | | | | 7 + *-----* | | | | | | | | | 6 + +-----+ +-----+ | | | | | | | | | 5 + *--+--* | | | | | | | | | 4 + +-----+ ------------+-----------+----------- sex 0 1

  18. Proc Frequency options nodate linesize=64; data k1; input a1 sex; cards; 4 0 5 0 6 0 6 1 7 1 11 1 ; procfreq data=k1; tables a1 sex; run;

  19. Proc Freq Output The FREQ Procedure Cumulative Cumulative a1 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 4 1 16.67 1 16.67 5 1 16.67 2 33.33 6 2 33.33 4 66.67 7 1 16.67 5 83.33 11 1 16.67 6 100.00 Cumulative Cumulative sex Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 3 50.00 3 50.00 1 3 50.00 6 100.00

  20. PROC PLOT • Creates: • Scatter plot • Criterion: vertical axis • Predictor: horizontal axis PROC PLOT DATA=dataset; PLOT variable1*variable2; variable1 will be on the vertical axis

  21. Example proc plot options nodate linesize=64 pagesize=100; data k1; input a b; cards; 4 5 5 6 6 7 6 7 7 8 11 2 ; procplot; plot b*a; run;

  22. Output Proc Plot • The SAS System 14 • Plot of b*a. Legend: A = 1 obs, B = 2 obs, etc. • b ‚ • ‚ • 8 ˆ A • 7 ˆ B • 6 ˆ A • 5 ˆ A • 4 ˆ • 3 ˆ • 2 ˆ A • ‚ • Šƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒ • 4 5 6 7 8 9 10 11 • a

  23. Proc GPLOT • Produces nicer graphs • Graphs can be copied into other documents. PROC GPLOT DATA=dataset; PLOT a*b b*a /overlay; • Overlay option causes output from two or more plot statements to be placed on a single graph.

  24. Example GPLOT data k1; input a b; cards; 4 5 5 6 6 7 6 7 7 8 11 2 ; procgplot; plot b*a a*b/overlay; run;

  25. Set Command • Used to combine datasets • Combines vertically (stack), D1 D2 D3 • In the data statement, you give a name to the new dataset after they have been stacked. • Combing subjects with the same variables. DATA dataset; SET D1 D2 D3;

  26. data a1; input a; cards; 4 5 6 ; data a2; input a; cards; 7 8 9 ; data a3; input a; cards; 10 11 12 ; data all; set a1 a2 a3; procprint; run; Example: Data set

  27. Proc Set Output The SAS System 45 Obs a 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9 12

  28. Example: Data Set cont. • Try changing the name of the variable in dataset a1 to “b” instead of “a.” data a1; Input b; cards; 4 5 6 ;

  29. Proc Set Output The SAS System 46 Obs b a 1 4 . 2 5 . 3 6 . 4 . 7 5 . 8 6 . 9 7 . 10 8 . 11 9 . 12

  30. Merge statement • Used to combine datasets horizontally (add variables) • Matches cases on specified variable • Produces missing values for cases not matched DATA NEW; MERGE D1 D2; BY SSN;

  31. data a2; input idnum c; cards; 3 7 2 8 1 9 ; data a3; input idnum b; cards; 1 10 2 11 3 12 ; procsort data = a2; by idnum; procsort data = a3; by idnum; data all; merge a2 a3; by idnum; procprint; run; Merge data Example

  32. Merge Output The SAS System 48 Obs idnum c b 1 1 9 10 2 2 8 11 3 3 7 12

  33. Delete Statement • EQ (=): equal to • NE (~=): not equal to • GT (>): greater than • LT (<): less than • GE (>=): greater than or equal to • LE (<=): less than or equal to

  34. Delete Statement • If _n_ = 2 then delete; • If a >= 3 then delete; • If b = . then delete;

  35. Delete Statement data a3; input idnum b; If _n_ = 2 then delete; cards; 1 10 2 11 3 12 ; procprint; run;

  36. Delete output The SAS System 49 Obs idnum b 1 1 10 2 3 12

More Related