180 likes | 387 Views
Computing normal probabilities in SAS PROC CORR PROC PLOT or PROC GPLOT.
E N D
Computing normal probabilities in SAS PROC CORR PROC PLOT or PROC GPLOT
Example: Data on the time between machine failures were collected during a study on machine performance that involved 39 similar machines. From the data we compute, the sample mean = 23.35hours and thesample standard deviation = 1.67h. • What is the percentage of machines that failed before 20 hours? • What is the percentage of machines that failed after 24 hours? • What is the percentage of machines with failure time between 20 and 22 hours? • How short should the failure time be for a machine to be in the bottom 10% ?
Computing the normal probabilities using SAS data normal; input x @@; mean=23.35; stdev=1.67; z=(x-mean)/stdev; pn=probnorm(z); invpn=1-probnorm(z); label pn="the normal dist. func. at x"; label z="the standardized value"; label invpn="the area to the right of x"; datalines; 20 24 ; run; procprint label; run;
SAS output the the normal the area standardized dist. to the Obs x mean stdev value func. at x right of x • 20 23.35 1.67 -2.00599 0.02243 0.97757 • 24 23.35 1.67 0.38922 0.65144 0.34856 Answer to Question 1 Answer to question 2
Computing the normal percentiles using SAS data percentile; input p @@; mean=23.35; stdev=1.67; ip=probit(p); ip=mean+stdev*ip; label ip="the p-th percentile"; datalines; 0.1 ; procprint label; run;
SAS output the p-th Obs p mean stdev percentile 1 0.1 23.35 1.67 21.2098 Answer to question 4
SAS procedures for scatter plots and correlation PROC CORR The CORR procedure is a statistical procedure for numeric random variables that computes Pearson correlation coefficients and some descriptive statistics. The correlation statistics include PROC CORR DATA= dataset-name; BY <DESCENDING> variable-1 <variable-n> <NOTSORTED>; VARvariable(s); WITH variable(s);
Data one; input time line step device; linet=line/1000; datalines; 0.0893 266 2 1 0.0386 120 1 1 0.0988 245 2 1 0.026 102 1 2 0.041 307 2 2 0.0196 143 1 2 ; proccorr; var time line step; run;
The CORR Procedure 3 Variables: time line step Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum time 6 0.05222 0.03349 0.31330 0.01960 0.09880 line 6 197.16667 86.06374 1183 102.0000 307.00000 step 6 1.50000 0.54772 9.0 1.0000 2.00000 Pearson Correlation Coefficients, N = 6 Prob > |r| under H0: Rho=0 time line step time 1.00000 0.61490 0.78996 0.1939 0.0615 line 0.61490 1.00000 0.96099 0.1939 0.0023 step 0.78996 0.96099 1.00000 0.0615 0.0023
proccorr; var time; with line step; run; Produces the correlations between time and line, and time and step only. The CORR Procedure 2 With Variables: line step 1 Variables: time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum line 6 197.16667 86.06374 1183 102.0000 307.000 step 6 1.50000 0.54772 9.00 1.0000 2.0000 time 6 0.05222 0.03349 0.313 0.0196 0.0988 Pearson Correlation Coefficients, N = 6 Prob > |r| under H0: Rho=0 time line 0.61490 0.1939 step 0.78996 0.0615
procsort; by device; proccorr; by device; var time line; run; The BY statement specifies the variable that the procedure uses to form BY groups. The data need to be sorted first by the BY variable. This procedure will compute the correlation between time and line for the two groups of data defined by the variable “device”
----------------------- device=1 -------------------------- The CORR Procedure 2 Variables: time line Pearson Correlation Coefficients, N = 3 Prob > |r| under H0: Rho=0 time line time 1.00000 0.96086 0.1787 line 0.96086 1.00000 0.1787 ----------------------- device=2 ----------------------------- The CORR Procedure 2 Variables: time line Pearson Correlation Coefficients, N = 3 Prob > |r| under H0: Rho=0 time line time 1.00000 0.88433 0.3092 line 0.88433 1.00000 0.3092
PROC PLOT – Low-level graphics (e.g. for Unix) The PROC PLOT creates a scatter plot for two variables. PROC PLOT <DATA=input-data-set>; PLOT xvar*yvar; Or PROC PLOT <DATA=input-data-set>; BYvariables; construct a different plot for each group defined by the BY variables. PLOT xvar*yvar=‘character’ $ labelvariable; Possible characters * + - a b… Labelvariable must be defined in the dataset and is used to label points in the graph.
High-level graphics - PROC GPLOT (Pg. 77 of SAS manual) SYMBOL is a global option that controls the plot display. SYMBOL<1 2 3…99> <COLOR=symbol-color> control the point color <VALUE=special-symbol | text-string | NONE> change the plotting symbol <INTERPOL=JOIN> join the points with a line <INTERPOL= R>; draw the regression line through the cloud of points The PROC GPLOT creates a scatter plot for two variables. PROC GPLOT <DATA=input-data-set>; PLOT xvar*yvar<=zvar>; BYvariables; construct a different plot for each group defined by the BY variables.
symbolinterpol=r value=dot color=red; procgplot; plot brate*lgnp; run;
Add a categorical variable procsort; by cg; run; procgplot; plot brate*gnp=cg; run;
symbol1interpol=none value=E color=red; symbol2interpol=none value=S color=black; symbol3interpol=none value=G color=green; symbol4interpol=none value=M color=blue; symbol5interpol=none value=A color=magenta; symbol6interpol=none value=F color=brown; procgplot; plot brate*gnp=cg; run;