1 / 36

Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book)

Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book). Animal Science 500 Lecture No. 5 September 14, 2010. Using a Procedure (PROC). Using the procedure statement is like filling out a form You fill in the blanks Choose from the list of options PROC statement

hamal
Download Presentation

Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sorting, Printing, and Summarizing Your Data(Chapter in the 4 Little SAS Book) Animal Science 500 Lecture No. 5 September 14, 2010

  2. Using a Procedure (PROC) • Using the procedure statement is like filling out a form • You fill in the blanks • Choose from the list of options • PROC statement • All procedures start with this statement • Is followed by the name of the procedure desired (Print, Means, Tabulate, etc.)

  3. PROC statements • If you have many SAS dataset you are using you can specify which dataset you want the procedure to applied to • PROC Contents Data = Pig12; • This statement is optional • If the data statement is not present the Procedure will be applied to the last data set created (not necessarily used).

  4. By statement • The by statement is required for only one Procedure, the PROC Sort • Obviously if you are sorting your data set you want to sort it by some variable contained in your data. • In all other Procedures the by statement is optional.

  5. Title and Footnote statements • Title prints at the top of your output • 10 line limit • Footnote prints at the bottom of your output • 10 line limit • You can place these anywhere you want • Applies to the Procedure you are working with so you might want to just include it in this particular section

  6. Title and Footnote statements • When using the title and footnote statements SAS does not care if you use the single quotes ‘test’ or the double quotes “test” as long as you are consistent. • Do not mix types of quotes

  7. Label statement • The default situation in SAS uses the variable names to label your output • Can create more descriptive labels • Limit up to 256 characters long limit • Example Label = DOT ‘date on test’ ADG = ‘Average Daily Gain’;

  8. Subsetting in Procedures with the Where Statement • The Where statement can be used with almost any PROC statement • The Where statement can be used to subset the data much like an If – Then statement • Advantage for the Where statement • The IF – Then statement only works in the Data step • The Where statement works in both the Data and the Procedure step.

  9. Subsetting in Procedures with the Where Statement • Examples using the Where statement • In the Data step • Where Backfat le .50 backfat group =1; • Where Backfat ge .50 AND le .75 backfat group =2; • Where Backfat ge .75 AND le 1.00 backfat group =3; • Else Backfat group =.; • Run; • Quit;

  10. Subsetting in Procedures with the Where Statement • Examples using the Where statement • In the Proc Step • Where Backfat Group = 1 • Title ‘Leanest Pigs From The Experiment 1’ • Footnote ‘Experimental Pigs with .50 inches of backfat and less’; • Run; • Quit;

  11. Subsetting in Procedures with the PROC Sort Statement • Many reasons to sort your data • Think of examples from labs • You can use the PROC Sort statement with the by option; • In the by statement you can include as many variables as you want • Example – Proc sort; by pen sex trt; • So it would sort by pen, then within pen sort sex and then, within pen and sex sort by treatment; • Often more useful to sort by one variable so that other procedures can be performed.

  12. Subsetting in Procedures with the PROC Sort Statement • The NODUPKEY eliminates duplicate observations that have the same values for the BY variable • Use this with the DUPOUT option • This option will put the deleted observations in new data set • Example • Data sort; by pen; Out = Pig13 NODUPKEY DUPOUT = duplicates;

  13. Subsetting in Procedures with the PROC Sort Statement • Example • Data new Proc sort; by pen; Out = Pig13 NODUPKEY DUPOUT = duplicates; • This is useful for working with field data • DHIA records • Records from Breed Associations (Yorkshire, Landrace, Angus, Hereford, etc.) • How might you find? • Data= name; Proc Sort; by herd id; Out = cleandata NODUPKEY DUPOUT = duplicates;

  14. Printing your Data with PROC PRINT • We have used PROC PRINT many times in lab • What is new is that there are several options in SAS associated with the PROC PRINT • Like other statements you can tell SAS what data set to use for printing • PROC PRINT DATA = PIG12

  15. Printing your Data with PROC PRINT • PROC PRINT Data = PIG12 NOOBS LABEL; • The NOOBS results in SAS NOT printing the observation number by each line • Might want the observation number to check counts for some reason so you would omit the NOOBS option • Use the NOOBS because it just takes up space

  16. Printing your Data with PROC PRINT • Variable option used with PROC PRINT • ID statement prints out the IDs rather than the observation number. • Useful to compare a printed list from SAS output to your original data that you have stored electronically or on paper. • The SUM option • This option sums the variable specified • Could be particularly useful when looking at pen weights rather than individual wts. when pen is the experimental unit.

  17. Printing your Data with PROC PRINT • The SUM option • PROC Print Data = Pig 12; by pen; Sum wtgain; VAR ID Pen Sex ADG wtgain; Title ‘Weight Gain Summed by Pens’; RUN; QUIT;

  18. SAS Can be Used to Write Simple Reports • For example you just wanted to summerize the data from Pig12 • Example • Data _Null_; Infile ‘some data file source’; Input variable names; Created variable like ADG; Tell where to put it; File ‘c:\ somefile\data.txt’ Print; Title; Put @5 ‘Sale Report for ‘Name’ from classroom’ Class // @5 ‘Congratulations! You sold ‘ Quantity ’ boses of candy’ / @5 ‘and earned ‘ Profit DOLLAR6.2 ‘ for out field trip.’; Put_PAGE_;

  19. Summarizing Your Data Using PROC MEANS • We have started to use PROC MEANS in LAB; • Numerous options to use with PROC MEANS • Use by including them behind PROC MEANS Option • The default will print the number of non-missing values, the mean, the standard deviation, and the minimum – maximum value for each observation. • The options available include; • Median –the numeric value separating the higher half of a sample • Mode – value occurring most frequently

  20. Summarizing Your Data Using PROC MEANS • Options cont’ • Nmiss – number of missing values by variable • Range – range of values • Sum – the sum • Use the default PROC MEANS; without the var option; • If you do not wish to obtain means for every trait you can indicate which variable means you wish to obtain • PROC MEANS var ADG, Backfat, LMA;

  21. Summarizing Your Data Using PROC MEANS • You can subset the data and obtain means using the by statement • You will get n sets of means depending on how many levels the by variable you sorted by

  22. Counting your Data Using PROC FREQ • PROC FREQ • A frequency table gives you simple counts or number of variables you have for each • When you have counts for one variable it is called a one-way frequencies • When you have two or more variables the counts are called two-way, three-way and so forth • Alternatively the multiple variables are called Cross tabulations • Used most frequently to show distribution of data variables • Can be used to identify data irregularities

  23. Counting your Data Using PROC FREQ • Basis form is PROC FREQ; Tables variable – combinations; • Options if any are included after a / after the tables statement i.e. • Tables variable – combinations / list; • The options available are: • List – prints cross-tabulations in list format rather than grid • Missing – Includes missing values in frequency statistics • NOCOL – suppresses printing column percentages in corss-tabulations • NOPERCENT- suppresses printing of percentages • NOROW - suppresses printing row percentages in cross-tabulations • Out = data-set writes a data set containing frequencies

  24. Counting your Data Using PROC FREQ • Options if any are included after a / after the tables statement i.e. • Tables variable – combinations / list; • The options cont’ are: • Chisq - performs the standard Pearson chi-square test on the table(s) requested. • Expected – prints the expected number of observations in each cell under the null hypothesis. • Exact - requests Fisher's exact test for the table(s). This is automatically computed for 2 x 2 tables. • Sparse - produces a full table, even if the table has many cells containing no observations.

  25. Example of One–Way Contingency Table

  26. Example of Two –Way Contingency Table • Table of hichol1 by hichol2 • hichol1 hichol2 • | 1 | 2 | Total • -----------+-----------+-------- • Frequeny 1 | 21 | 21 | 42 • Percent | 22.83 | 22.83 | 45.65 • -----------+------------+-------- • 2 | 23 | 27 | 50 • | 25.00 | 29.35 | 54.35 • -----------+------------+-------- • Total 44 48 92 • 47.83 52.17 100.00 • Frequency Missing = 2

  27. Using PROC TABULATE • Every summary statistic the TABULATE computes can be produced by other procedures • Print • Means, and • Freq But some people think PROC TABULATE output is much “prettier”.

  28. Using PROC TABULATE • General Form • PROC TABULATE; • CLASS –classification variable list; • TABLE page-dimension, row-dimension, column-dimension CLASS statement tell SAS which variables contain categorical data to be used for dividing the observations into groups • In our example data set this would mean things like test, sex, pen, treatment, breed

  29. Using PROC TABULATE • General Form • PROC TABULATE; • CLASS –classification variable list; • TABLE page-dimension, row-dimension, column-dimension • TABLE statement defines only one table but you might have multiple TABLE statements • If a variable is listed in the CLASS statement, then, by default, PROC TABULATE produces simple counts of the number of observations in each category fo that variable. • PROC TABULATE offers many other statistics too.

  30. Using PROC TABULATE • General Form • PROC TABULATE; • CLASS –classification variable list; • TABLE page-dimension, row-dimension, column-dimension • Each TABLE statement can specify up to 3 dimensions • The dimensions separated by commas , and tell SAS which variables are used for the pages, rows, and columns • Specify one dimension then it becomes the column dimension • Specify two dimensions then they become rows and columns • Specify three dimensions then they become row, columns and pages

  31. Using PROC TABULATE • General Form • PROC TABULATE; • CLASS –classification variable list; • TABLE page-dimension, row-dimension, column-dimension • The default results in observations that are missing are not included in the in the tables • To keep the missing values included in the report then the code PROC TABULATE MISSING; must be used

  32. Example form Generated by PROC TABULATE • PROC TABULATE; • CLASS GENDER; • VAR AGE INCOME EDUC; • TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL; • RUN; • QUIT;

  33. Example form Generated by PROC TABULATE • PROC TABULATE; • CLASS GENDER; • VAR AGE INCOME EDUC; • TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL; • RUN; • QUIT;

  34. Output from a PROC TABULATE example Output 1.1 Variable Label N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------------------------------------------------- AGE Age 6639 48.614 16.598 25.000 90.000 INCOME Income 6639 25065.797 23850.488 0.000 263253.000 EDUC Education 6639 13.040 2.953 4.000 19.000 --------------------------------------------------------------------------------------------------------------------------

  35. Output from a PROC TABULATE example Output 1.2 GENDER = Female Variable Label N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------------------------------------------------- AGE Age 3559 49.528 17.158 25.000 90.000 INOME Income 3559 17780.087 17070.596 0.000 263253.000 EDUC Education 3559 12.932 2.899 4.000 19.000 ------------------------------------------------------------------------------------------------------------------------- GENDER = Male Variable Label N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------------------------------------------------- AGE Age 3080 47.558 15.864 25.000 90.000 INCOME Income 3080 33484.577 27520.481 0.000 251998.000 EDUC Education 3080 13.165 3.011 4.000 19.000 -------------------------------------------------------------------------------------------------------------

  36. Output from a PROC TABULATE example Output 1.3 ------------------------------------------------------------------------------------------------------------------------- | | GENDER | | | |-------------------------------| | | | Female | Male | ALL | |-----------------------------------------------------+---------------+--------------+-------------------------------- | Age MEAN | 49.53 | 47.56 | 48.61 | |-----------------------------------------------------+---------------+--------------+--------------------------------| | Income MEAN | 17780.09 | 33484.58 | 25065.80 | |-----------------------------------------------------+---------------+--------------+--------------------------------| | Education MEAN | 12.93 | 13.17 | 13.04 | -------------------------------------------------------------------------------------------------------------------------

More Related