1 / 35

Lab 4 Wednesday, February 04, 2004

Lab 4 Wednesday, February 04, 2004. Manipulating your data. Example Problem.

daire
Download Presentation

Lab 4 Wednesday, February 04, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 4Wednesday, February 04, 2004 Manipulating your data

  2. Example Problem Suppose you have 45 student’s GPAs and you want to answer two questions, do psychology majors have higher GPAs than other majors and do psychology students who work full time have lower GPAs than students who do not work at all.

  3. 113389 213400 311268 411156 511203 622245 721100 821356 911210 1012310 Column 1-2: Participant Id # Column 3: major (1=psychology, 2=math, 3=english) Column 4: Employment status (1=full-time, 2=part-time, 3=don’t work) Column 5-7: GPA Part of the Data

  4. Infile and input statement Data D1; INFILE ‘C:\My Documents\exdatalab4.txt’; Input id 1-2 major 3 work 4 @ 5 (gpa)(3.2) ; Proc print; Run;

  5. Proc print output Obs id major work gpa 1 1 1 3 3.89 2 2 1 3 4.00 3 3 1 1 2.68 4 4 1 1 1.56 5 5 1 1 2.03 6 6 2 2 2.45 7 7 2 1 1.00 8 8 2 1 3.56 9 9 1 1 2.10 10 10 1 2 3.10

  6. Clean data Data D1; INFILE 'C:\My Documents\exdatalab4.txt'; Input id 1-2 major 3 work 4 @ 5 (gpa)(3.2) ; procfreq; Tables id; Proc univariate; Var work major gpa; run;

  7. Clean data cont. The FREQ Procedure Cumulative Cumulative id Frequency Percent Frequency Percent --------------------------------------------------------------------- 1 1 2.22 1 2.22 2 1 2.22 2 4.44 3 1 2.22 3 6.67 4 1 2.22 4 8.89 5 1 2.22 5 11.11 6 1 2.22 6 13.33 7 1 2.22 7 15.56 8 1 2.22 8 17.78 9 1 2.22 9 20.00 10 1 2.22 10 22.22

  8. Clean data (cont) The UNIVARIATE Procedure Variable: work Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 1 42 3 32 1 41 3 35 1 38 3 39 1 37 3 40 1 36 3 45

  9. Clean data (cont) The UNIVARIATE Procedure Variable: major Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 1 45 3 28 1 44 3 29 1 43 3 34 1 40 3 35 1 39 6 23

  10. Clean Data (cont) GPA - Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 1.00 7 3.95 23 1.56 4 3.95 41 1.67 33 4.00 2 1.80 26 4.00 39 2.03 5 5.56 31

  11. Clean data (cont) Obs id major work gpa 22 22 2 1 3.67 23 23 6 1 3.95 24 24 3 3 3.80 25 25 3 1 2.80 26 26 1 1 1.80 27 27 3 3 2.50 28 28 3 3 2.90 29 29 3 3 3.15 30 30 1 3 3.00 31 31 1 3 5.56 32 32 1 3 3.89

  12. Delete bad item Data D1; INFILE 'C:\My Documents\exdatalab4.txt'; Input id 1-2 major 3 work 4 @ 5 (gpa)(3.2) ; Data d2; Set d1; if _n_ = 23 then delete; if _n_ = 31 then delete; procprint data=d2; Run;

  13. Output Obs id major work gpa majorr 21 21 2 1 3.05 1 22 22 2 1 3.67 1 23 24 3 3 3.80 1 24 25 3 1 2.80 1 25 26 1 1 1.80 0 26 27 3 3 2.50 1 27 28 3 3 2.90 1 28 29 3 3 3.15 1 29 30 1 3 3.00 0 30 32 1 3 3.89 0

  14. Recode Major to dichotomousPsychology major vs not pych Data D1; INFILE 'C:\My Documents\exdatalab4.txt'; Input id 1-2 major 3 work 4 @ 5 (gpa)(3.2) ; Data d2; Set d1; if _n_ = 23 then delete; if _n_ = 31 then delete; if major = 1 then majorr = 0; if major = 2 then majorr = 1; if major = 3 then majorr = 1; procprint data=d2; Run;

  15. Proc print Obs id major work gpa majorr 1 1 1 3 3.89 0 2 2 1 3 4.00 0 3 3 1 1 2.68 0 4 4 1 1 1.56 0 5 5 1 1 2.03 0 6 6 2 2 2.45 1 7 7 2 1 1.00 1 8 8 2 1 3.56 1 9 9 1 1 2.10 0 10 10 1 2 3.10 0

  16. Hypothesis • Ho: psychology majors GPAs = other majors GPAs • H1: psychology majors have higher GPAs than other majors

  17. Testing means procunivariate normal plot data=d2; Var gpa; By majorr; procglm data=d2; class majorr; model gpa = majorr; means majorr/hovtest; Run;

  18. Proc GLM output The GLM Procedure Dependent Variable: gpa Sum of Source DF Squares Mean Square F Value Pr > F Model 1 0.01065652 0.01065652 0.02 0.8905 Error 41 22.78548766 0.55574360 Corrected Total 42 22.79614419 R-Square Coeff Var Root MSE gpa Mean 0.000467 25.58931 0.745482 2.913256

  19. Proc glm output (cont) The GLM Procedure Level of -------------gpa------------- majorr N Mean Std Dev 0 22 2.92863636 0.74777317 1 21 2.89714286 0.74306893 Therefore, no support was found for the hypothesis that psychology majors have higher GPAs than English and Math Majors (F(1, 41) = .02, ns).

  20. Sample problem Do psychology students who work full time have lower GPAs than students who do not work at all. Ho: Full time GPAs = no work GPAs H1: Students who don’t work > GPAs than students who work full time

  21. Create data set of only Psychology majors and delete part-time data Data d3; Set d2; If major = 0; If work = 2 then delete; Proc print d3; Run;

  22. Output Obs id major work gpa majorr 1 1 1 3 3.89 0 2 2 1 3 4.00 0 3 3 1 1 2.68 0 4 4 1 1 1.56 0 5 5 1 1 2.03 0 6 9 1 1 2.10 0 7 18 1 3 3.00 0 8 19 1 3 3.15 0 9 20 1 3 2.56 0 … 18 45 1 3 3.15 0

  23. Compare means procunivariate normal plot data=d3; Var gpa; By work; procglm data=d3; class work; model gpa = work; means work/hovtest; Run;

  24. Proc GLM output The GLM Procedure Dependent Variable: gpa Sum of Source DF Squares Mean Square F Value Pr > F Model 1 5.10034028 5.10034028 15.98 0.0010 Error 16 5.10768750 0.31923047 Corrected Total 17 10.20802778 R-Square Coeff Var Root MSE gpa Mean 0.499640 19.86733 0.565005 2.843889

  25. Proc GLM output (cont) The GLM Procedure Level of -------------gpa------------- work N Mean Std Dev 1 8 2.24875000 0.54971259 3 10 3.32000000 0.57661850 Therefore, students who don’t work have significantly higher GPAs (M = 3.32, SD = .58) than those students who work full time (M = 2.25, SD = .55; F(1,16) = 15.98, p < .05).

  26. Notes on what to include in the write-up • Talk about tests of assumptions and if there is a violation talk about the consequences. Include test statistics (W for normality and F for homogeneity of variance). • Report means, SDs, and F-value. What do the results means in terms of the hypotheses. • Overall conclusions

  27. Exercises • 4 variables: • Id (participant id number) • Gender (female = 0, male = 1) • Depress (depression scale ranging from 29 to 45) • Age (range 6 to 98) • Open “lab4.sas” on Dr. Brannick’s website

  28. Exercise (cont) • Clean the data. Check the gender variable with a proc univariate statement to see if they are out of bound values. • Delete the out of bounds value.

  29. Program • Cards; • ; • data d2; • set d1; • if _n_ = 15 then delete; • procprint; • run;

  30. Exercise 2 • Create a data set of females that are under 21.

  31. Program data d2; set d1; if _n_ = 15 then delete; data d3; set d2; if gender = 0 and age <21; procprint; run;

  32. Output Obs id gender depres age 1 1 0 39 18 2 9 0 34 14 3 10 0 37 19 4 23 0 33 7 5 24 0 39 10 6 25 0 36 12 7 26 0 45 18 8 27 0 40 16 9 28 0 41 12 10 29 0 33 8

  33. Exercise 3 • Create a data set of males who have a score on the depress scale under 38.

  34. Program data d4; set d2; if gender = 1 and depres<38; procprint; run;

  35. Output Obs id gender depres age 1 32 1 36 79 2 33 1 35 72 3 35 1 29 63 4 37 1 33 56 5 38 1 34 51 6 39 1 36 46 7 40 1 32 41 8 41 1 32 35 9 42 1 31 30 10 45 1 32 18 11 46 1 37 25 12 47 1 36 36 13 49 1 36 5

More Related