80 likes | 205 Views
Section 3.9: RETAIN & sum statements because all variables are set to missing at the start of each iteration of the DATA step, we need a statement to override this... RETAIN does the job - it keeps the previous iteration’s value of whatever variable you list in the RETAIN statement
E N D
Section 3.9: RETAIN & sum statements • because all variables are set to missing at the start of each iteration of the DATA step, we need a statement to override this... RETAIN does the job - it keeps the previous iteration’s value of whatever variable you list in the RETAIN statement • the sum statement is used to accumulate values of a variable ... see example on p.93 • Now use the RETAIN and sum statements to find the max. cost job and the total cost of all jobs for the data in section 3.5, p.85 - use retain maxcost; maxcost=max(maxcost,cost); totcost+cost;
DATA homeimprovements; INPUT Owner $ 1-7 Description $ 9-33 Cost; IF Cost = . THEN CostGroup = 'missing'; ELSE IF Cost < 2000 THEN CostGroup = 'low'; ELSE IF Cost < 10000 THEN CostGroup = 'medium'; ELSE CostGroup = 'high'; DATALINES; 1234567890123456789012345678901234567890 Bob kitchen cabinet face-lift 1253.00 Shirley bathroom addition 11350.70 Silvia paint exterior . Al backyard gazebo 3098.63 Norm paint interior 647.77 Kathy second floor addition 75362.93 ; PROC PRINT DATA = homeimprovements; TITLE 'Home Improvement Cost Groups'; RUN;
Now let’s consider Arrays in SAS - there are two types: implicit and explicit ... we’ll look at explicit only since they are the recommended type to use: • the ARRAY statement defines a set of variables (either all character or all numeric) so you may process them all at one time. An explicit array statement must contain a name for the array, a number that tells how many elements there are in the array, and a list of the elements (variables) in the array. • arrays are often processed in DO groups so that the same thing is done to all elements of the array. • See the example in section 3.10 on p. 94-95 • I is used as an index variable to refer to members of the array. I is incremented by 1 each time through the DO loop... • The array variables themselves do not become part of the DATA set, but I does
DATA songs; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; *ratings on a scale of 1 to 5 for each of 10 songs; *missing values denoted by a 9 - not usual for SAS; *so create an array and replace 9 by . in each case; ARRAY song (10) domk wj hwow simbh kt aomm libm tr filp ttr; DO i = 1 TO 10; IF song(i) = 9 THEN song(i) = .; END; 12345678901234567890 Albany 54 4 3 5 9 9 2 1 4 4 9 Richmond 33 5 2 4 3 9 2 9 3 3 3 Oakland 27 1 3 2 9 9 9 3 4 2 3 Richmond 41 4 3 5 5 5 2 9 4 5 5 Berkeley 18 3 4 9 1 4 9 3 9 3 2 ; PROC PRINT DATA = songs; TITLE 'WBRK Song Survey'; RUN;
There are a couple of different ways to shortcut variable names in SAS • make your variables start with the same characters and end with consecutive numbers: a1, a2 a3, ... a24; then you may abbreviate them as a1-a24 • if your variables are not set up as above, you may refer to consecutive ones with the double hyphen method: e.g., for INPUT x y r ca cb cc; you could then refer to these variables as : PROC PRINT; var y--cb; run; To find the internal order of variables in your dataset, use PROC CONTENTS POSITION; • there are a few special SAS name lists: _ALL_ refers to all the variables; _CHARACTER_ refers to all the character variables, and _NUMERIC_ refers to all the numeric variables in the dataset. (PUT _ALL_ and MEAN(of _NUMERIC_) are examples of how you might use these...)
Notice how these abbreviations are used in the example program on page 97: DATA songs; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; ARRAY new (10) Song1 - Song10; ARRAY old (10) domk -- ttr; DO i = 1 TO 10; IF old(i) = 9 THEN new(i) = .; ELSE new(i) = old(i); END; AvgScore = MEAN(OF Song1 - Song10); DATALINES; Albany 54 4 3 5 9 9 2 1 4 4 9 Richmond 33 5 2 4 3 9 2 9 3 3 3 Oakland 27 1 3 2 9 9 9 3 4 2 3 Richmond 41 4 3 5 5 5 2 9 4 5 5 Berkeley 18 3 4 9 1 4 9 3 9 3 2 PROC PRINT DATA = songs; TITLE 'WBRK Song Survey'; RUN;
Homework for Wednesday: • complete your reading of the textbook through Chapter 3 • look at the “oscars” dataset: • read the excel data into SAS • print it back out • begin exploring this dataset using the various methods we’ve talked about so far: • SORTing and PRINTing; MEANS for the numeric variables as appropriate; FREQ for the categorical variables with crosstabulations as you think might be interesting… • we’ll talk about it some more on wednesday… this data will be part of your midterm