Hypothesis Testing

Principles of StatisticsAssoc. Prof. Dr. Abdul Hamid b. Hj. Mar ImanFormer Director,Centre for Real Estate StudiesFaculty of Geoinformation Science and Engineering,Universiti Teknologi Malaysia,Skudai, Johor.E-mail: hamid@fksg.utm.my

Hypothesis Testing Content: • Concepts of hypothesis testing • Test of statistical significance • Hypothesis testing one variable at a time

Hypothesis • Unproven proposition • Supposition that tentatively explains certain facts or phenomena • Assumption about nature of the world • E.g. the mean price of a three-bedroom single storey houses in Skudai is RM 155,000.

Hypothesis (contd.) • An unproven proposition or supposition that tentatively explains certain facts or phenomena: • Null hypothesis • Alternative hypothesis • Null hypothesis is that there is no systematic relationship between independent variables (IVs) and dependent variables (DVs). • Research hypothesis is that any relationship observed in the data is real.

Null Hypothesis • Statement about the status quo • No difference • Statistically expressed as: Ho: b=0 where b is any sample parameter used to explain the population.

Alternative Hypothesis • Statement that indicates the opposite of the null hypothesis • There is difference • Statistically expressed as: H1: b  0 H1: b < 0 H1: b > 0

Significance Level • Critical probability in choosing between the Ho and H1. • Simply means, the cut-off point (COP) at which a given value is probably true. • Tells how likely a result is due to chance • Most common level, used to mean “something is good enough to be believed”, is .95. • It means, the finding has a 95% chance of being likely true. • What is the COP at 95% chance?

Significance Level (contd.) • Denoted as  • Tells how much the probability mass is in the tails of a given distribution • Probability or significance level selected is typically .05 or .01 • Too low to warrant support for the null hypothesis • In other words, high chances to warrant support for alternative hypothesis • Main purpose of statistical testing: to reject null hypothesis

Significance Level (contd.) P[-1.96  Z  1.96] = 1 -  = 0.95 P[Z  Zc] = P[Z  -Zc] = /2

Let say we have the following relationship: Y = β + eii=1,…, T and ei ~ N(0,σ2) ……………....(1) The least square estimator for β is: T b=Yi/T ……………………………………………..(2) i=1 with the following properties: 1) E[b] = β ………………………………………….(3a) 2) Var(b)=E[(b-β)]2 = σ2/T ………………………...(3b) 3) b~N(β, σ2/T) …………………………………….(3c)

The “standardized” normal random variable for β is: b-β Z =-------- ~ N(0,1) ……………………………………..(4) (σ2T) The critical value of Z, i.e. Zc, such that α=0.05 of the probability mass is in the tails of distribution, is given as: P[Z 1.96] = P[Z -1.96]=0.025 ………………………(5a) and P[-1.96  Z 1.96]=1-0.05=0.95 ………………………(5b)

Substituting SND for variable β (Eqn. 4) into Eqn (5a), we get: b-β P[-1.96  --------- 1.96]=0.95 ……………………………..…...(6) (σ2/T) Solving for β, we get: P[b-1.96σ/T βb+1.96σ/T]=0.95 ………………………… (7) In general: P[b-Zcσ/T  βb+Zcσ/T]= 1- ……………….. (8a) b-βb -β Also: P[-------  -Zc] = P[ --------  Zc] = α/2 (2-tail test) ...…(8b) σ/T σ/T

Example You suspect that the mean rental of 225 purpose-built office units in Johor is RM 3.00/sq.ft. If the std. dev. is RM 1.50/sq.ft., what is the 95% confidence interval of the mean? The null hypothesis that the mean is equal to 3.0: Ho: μ = 3.0 The alternative hypothesis that the mean does not equal to 3.0: H1: μ 3.0

A Sampling Distribution a=.025 a=.025 m=3.0 -XL = ? XU = ?

Critical values ofm Critical value - upper limit

Critical values ofm

Critical values ofm Critical value - lower limit

Critical values ofm

Region of Rejection LOWER LIMIT m=3.0 UPPER LIMIT

Hypothesis Test m =3.0 2.804 3.78 m=3.0 3.196

Type I and Type II Errors Accept null Reject null Null is true Correct- no error Type I error Null is false Type II error Correct- no error

Type I and Type II Errors in Hypothesis Testing State of Null Hypothesis Decision in the Population Accept Ho Reject Ho Ho is true Correct--no error Type I error Ho is false Type II error Correct--no error

Example You estimate that the average price, μ, of single- and double-storey houses in Malaysia’s major industrialised towns to be RM 1,600/sq.m. Based on a sample of 101 houses, you found that the mean price, , is 1,579.44/sq.m. with a std dev. of RM 350.13/sq.m. • Would you reject your initial estimate at 0.05 significance level? • What is the confidence interval of rental at 5% s.l.?

Answer (a) Ho = 1,600 H1 1,600 1,579.44 – 1,600 Test statistic: Z = -------------------- 350.13/101 ≈ -0.59 P[Z  Zc] = P[Z  -Zc] = 0.05 P[0.59  Zc ] = 0.05 From Z-table, Zc = 1.645 Since Z < Zc,do not reject Ho. ∴ Rental = RM 1,600/sq.m.

Answer (b) 1,579.13-1.645(34.84)=RM 1,521.82 (lower limit) 1,579.13+1.645(34.84)=RM 1,636.44 (upper limit)

NONPARAMETRIC STATISTICS PARAMETRIC STATISTICS

t-Distribution • Symmetrical, bell-shaped distribution • Mean of zero and a unit standard deviation • Shape influenced by degrees of freedom

Degrees of Freedom • Abbreviated d.f. • Number of observations • Number of constraints

or Confidence Interval Estimate Using the t-distribution

Confidence Interval Estimate Using the t-distribution = population mean = sample mean = critical value of t at a specified confidence level = standard error of the mean = sample standard deviation = sample size

Confidence Interval Estimate Using the t-distribution

Hypothesis Test Using the t-Distribution

Univariate Hypothesis Test Utilizing the t-Distribution Suppose that a production manager believes the average number of defective assemblies each day to be 20. The factory records the number of defective assemblies for each of the 25 days it was opened in a given month. The mean was calculated to be 22, and the standard deviation, ,to be 5.

Univariate Hypothesis Test Utilizing the t-Distribution The researcher desired a 95 percent confidence, and the significance level becomes .05.The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection. Thus, the value of t is needed. For 24 degrees of freedom (n-1, 25-1), the t-value is 2.064.

Univariate Hypothesis Test t-Test

Testing a Hypothesis about a Distribution • Chi-Square test • Test for significance in the analysis of frequency distributions • Compare observed frequencies with expected frequencies • “Goodness of Fit”

Chi-Square Test

Chi-Square Test x² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell

Chi-Square Test Estimation for Expected Number for Each Cell

Chi-Square Test Estimation for Expected Number for Each Cell Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size

Univariate Hypothesis Test Chi-square Example

Hypothesis Test of a Proportion p is the population proportion p is the sample proportion p is estimated with p

Hypothesis Test of a Proportion p = H : . 5 0 p ¹ H : . 5 1

Hypothesis Testing