200 likes | 496 Views
描述统计的 SAS 方法. Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心. 建立和执行 SAS 程序的步骤 建立 SAS 程序 在程序编辑窗口,或者文本文件编辑器进行 2. 运行 SAS 程序-点击工具条的图标 3. 观察日志文件-发现是否有错误和警告 如果有错误发生,返回到第一步,然后重复1-3 5. 如果没有错误了,看输出结果窗口 . SAS 描述性统计程序步 ( Descriptive Procedures ) PROC PRINT PROC MEANS PROC UNIVARIATE PROC FREQ
E N D
描述统计的SAS方法 Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心
建立和执行SAS程序的步骤 • 建立SAS程序 • 在程序编辑窗口,或者文本文件编辑器进行 • 2. 运行SAS 程序-点击工具条的图标 • 3. 观察日志文件-发现是否有错误和警告 • 如果有错误发生,返回到第一步,然后重复1-3 • 5. 如果没有错误了,看输出结果窗口
SAS 描述性统计程序步 • (Descriptive Procedures) • PROC PRINT • PROC MEANS • PROC UNIVARIATE • PROC FREQ • PROC PLOT • PROC CHART • PROC GPLOT • PROC GCHART
程序步的用法(Syntax for Procedures) PROC PROCNAME DATA=datasetname <可选项> ; 子句(substatements)/<可选项> ; WHERE 句子是所有程序步都允许的子句 (WHERE statement is a useful substatement available to all procedures.) PROC PRINT DATA=demo ; VAR marstat ; WHERE state = 'MN';
DATA demo; INFILEDATALINES; INPUT gender $ age marstat $ credits state $ ; if credits > 12then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN'then resid = 'Y'; else resid = 'N'; DATALINES; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN;
* PROGRAM 3; DATA weight; INFILE‘d:\...\tomhs.txt' ; INPUTptid $ clinic $ sex $ height weight; bmi = (weight*703.0768)/(height*height); * bmi 的单位是kg/m2; RUN;
PROCPRINTDATA = weight (OBS=5) NOOBS; TITLE ‘Proc Print: TOMHS 数据的5条观测'; RUN; PROCMEANSDATA = weight; VAR height weight bmi; TITLE'Proc Means Example 1'; RUN; PROCMEANSDATA = weight MEANMEDIANSTDMAXDEC=2; VAR height weight bmi; TITLE‘Proc Means Example 2 (指定选项)'; RUN;
Proc Print: Five observations from the TOMHS Study patid clinic sex height weight bmi C03615 C 1 71.5 205.5 28.2620 B00979 B 1 69.5 247.3 35.9963 B00644 B 1 60.0 138.5 27.0489 D01348 D 1 71.5 205.5 28.2620 A01088 A 1 72.0 244.8 33.2008 Proc Means Example 1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------- height 100 68.0750000 3.8536189 58.0000000 77.0000000 weight 100 191.7560000 34.5107254 128.5000000 279.3000000 bmi 100 28.9808397 3.9911476 21.4572336 37.5178852 --------------------------------------------------------------------------
Proc Means Example 2 (指定选项) The MEANS Procedure Variable Mean Median Std Dev -------------------------------------------------------- height 68.08 67.50 3.85 weight 191.76 192.65 34.51 bmi 28.98 28.02 3.99 --------------------------------------------------------
FW=field width, 字段宽 PROCMEANSDATA = weight NMEANSTDMAXDEC=2 FW=8; CLASS clinic; TITLE‘Proc Means Example 3 (使用 CLASS类别语句)'; RUN; N clinic Obs Variable N Mean Std Dev ---------------------------------------------------------- A 18 height 18 67.89 3.04 weight 18 192.73 37.68 bmi 18 29.24 4.50 B 29 height 29 67.76 4.76 weight 29 185.58 34.00 bmi 29 28.39 4.22 C 36 height 36 69.08 3.36 weight 36 202.91 33.74 bmi 36 29.76 3.62 D 17 height 17 66.68 3.61 weight 17 177.65 28.05 bmi 17 28.06 3.79 -----------------------------------------------------------
PROCUNIVARIATEDATA = weight PLOT ; ID ptid; VAR bmi; TITLE'Proc Univariate Example 1'; RUN; * Note: PROC UNIVARIATE will give you much output ;
Proc Univariate Example 1 The UNIVARIATE Procedure Variable: bmi Moments N 100 Sum Weights 100 Mean 28.9808397 Sum Observations 2898.08397 Std Deviation 3.99114757 Variance 15.9292589 Skewness 0.27805446 Kurtosis -0.8987587 Uncorrected SS 85565.9037 Corrected SS 1576.99663 Coeff Variation 13.7716768 Std Error Mean 0.39911476 Basic Statistical Measures Location Variability Mean 28.98084 Std Deviation 3.99115 Median 28.01524 Variance 15.92926 Mode 28.26198 Range 16.06065 Interquartile Range 6.68654 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 72.6128 Pr > |t| <.0001 Sign M 50 Pr >= |M| <.0001 Signed Rank S 2525 Pr >= |S| <.0001
Quantile Estimate 100% Max 37.5179 99% 37.4385 95% 35.8871 90% 34.3378 75% Q3 32.6299 50% Median 28.0152 25% Q1 25.9433 10% 24.1495 5% 22.9373 1% 21.8969 0% Min 21.4572 Extreme Observations ------------Lowest------------ ------------Highest----------- Value patid Obs Value patid Obs 21.4572 A00083 64 35.9963 B00979 2 22.3365 C04206 49 36.3726 B03077 67 22.4057 B00714 8 37.2037 A01166 9 22.6773 A00312 21 37.3592 C05323 92 22.8387 B00262 27 37.5179 B02059 25
Stem Leaf # Boxplot 37 245 3 | 36 04 2 | 35 28 2 | 34 3357 4 | 33 000222344789 12 | 32 135677 6 +-----+ 31 3344588 7 | | 30 159 3 | | 29 26 2 | + | 28 00023335789 11 *-----* 27 000334466678 12 | | 26 02345566889 11 | | 25 223344789 9 +-----+ 24 1235688 7 | 23 0459 4 | 22 3478 4 | 21 5 1 | ----+----+----+----+ 75th Percentile Mean 25th Percentile
The UNIVARIATE Procedure Variable: bmi Normal Probability Plot 37.5+ * *+ * | *++ | *** | ***+ | ***** | **+++ | **++ | *++ 29.5+ +** | ++*** | +**** | +**** | **** | **** | ***+ | * ***++ 21.5+* ++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 直线表明数据是正态分布
* High resolution graphs can also be produced. The following makes a histogram ; PROCUNIVARIATEDATA = weight; VAR bmi; HISTOGRAM bmi / NORMALMIDPOINTS=20 to 40 by 2; INSETN = 'N' (5.0) MEAN = 'Mean' (5.1) STD = 'Sdev' (5.1) MIN = 'Min' (5.1) MAX = 'Max' (5.1)/ POS=lm HEADER='Summary Statistics'; LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE'Histogram of BMI'; RUN;
Using Comment Statements in SAS • Two Purposes • Documenting your program • Temporary delete part of a program • See Page 15-18 C & S
Examples of Comment Code * Run proc univariate for variable BMI; *---------------------------------------------------------------------* High resolution graphs can also be produced. The following makes a pdf file containing a histogram with the best fit normal curve and summary statistics. Other types of files such as GIF *---------------------------------------------------------------------*; PROCUNIVARIATEDATA = weight PLOT ; * ID patid ; VAR bmi; PROCUNIVARIATEDATA = weight /*PLOT*/; VAR bmi;
Temporarily Removing Code: Do not want to produce histogram but may want to run it at another time PROCUNIVARIATEDATA = weight; VAR bmi; /* HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2; INSET N = 'N' (5.0) MEAN = 'Mean' (5.1) STD = 'Sdev' (5.1) MIN = 'Min' (5.1) MAX = 'Max' (5.1)/ POS=lm HEADER='Summary Statistics'; */ LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE'Histogram of BMI'; RUN;