Proportion Data

Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Freund, RJ, and WJ Wilson (1998) Regression Analysis, Academic Press. • Gentle, JE (2002) Elements of Computational Statistics. Springer. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

Introduction • These four demonstration sessions of this class address special types of data: • Counts • Proportions (this lecture) • Survival analysis • Binary responses

Frequencies and Proportions • With frequency data, we know how often something happened, but not how often it didn’t happen. • With proportion data, we know both. • Applied to: • Mortality and infection rates • Response to clinical treatment • Voting • Sex ratios • Proportional response to experimental treatments

Working With Proportions • Traditionally, proportion data was modelled by using the percentage as the response variable. • This is bad for four reasons: • Errors are not normally distributed. • Non-constant variance. • Response is bounded by 0.0 and 1.0. • The size of the sample, n, is lost.

General Approach • Use a general linear model (glm). • family = binomial (i.e., unfair coin flip) • Uses two vectors, one of the success counts and the other of the failure counts. • number of failures + number of successes = binomial denominator, n • y<-cbind(successes, failures) • model<-glm(y~whatever,binomial)

How R Handles Proportions • Weighted regression (weighted by the individual sample sizes). • logit link to ensure linearity • If percentage cover data • Do an arc-sine transformation, followed by conventional modelling (normal errors, constant variance). • If percentage change in a continuous measurement • ANCOVA with final weight as the response and initial weight as a covariate, or • Use the relative growth rate (log(final/initial)) as response. • Both produce normal errors.

Tests • To compare a single binomial proportion to a constant, use binom.test. • To compare two samples, use prop.test. • Only use the following methods for complex models: • Regression tables • Contingency tables

Count Data on Proportions • R supports the usual arcsine and probit transformations: • arcsine makes the error distribution normal • probit linearises the relationship between percentage mortality and log(dose) • However, it is usually better to use the logit transformation and assume you have binomial data.

Odds • The logistic model for p as a function of x is: p = exp(a+bx)/(1 + exp(a+bx)) • The book notes that this is obviously non-linear. To linearise it, consider instead the odds p/q (as in gambling, where q is 1-p): p/q = exp(a+bx) • Or: ln(p/q) = a + bx • ln(p/q) is called the logit transformation of p

R and logit • R does not simply do a linear regression of ln(p/q) against x. It also handles: • non-constant binomial variance • logit(p) going to - and +. • differences between sample sizes using weighted regression.

Over-dispersion and Hypothesis Testing • Everything addressed earlier is still available for proportions data. This includes ANOVA, ANCOVA, and regression analysis. • Significance is assessed using 2 tests. • Hypothesis testing with binomial errors is less clear-cut than normal errors. Large samples (>30) are necessary. The degree to which the approximation is satisfactory is unknown. p will not be exactly known. • Over-dispersion must usually be addressed. The residual scaled deviance should be about the residual df. Use family = quasibinomial for over-dispersion.

Book Examples • See discussion of how to model with binomial errors. • Logistic regression example. • Categorical explanatory variables example. • ANCOVA example.

Proportion Data

Proportion Data

Presentation Transcript

Proportion Jeopardy

Facial Proportion

Proportion

Proportion

Julian Center on Regression for Proportion Data

PROPORTION

PROPORTION

PROPORTION

Proportion

Proportion

Proportion

Proportion

PROPORTION

Proportion

Sample Proportion

Direct Proportion

Proportion and Non-Proportion Situations

PROPORTION

Proportion

Proportion and Non-Proportion Situations

Proportion

PROPORTION