Computing for Research I Spring 2011

Computing for Research ISpring 2011 R: Random number generation & Simulations March 21 Presented by: Yanqiu Weng

Outline How to sample from common distribution: Uniform distribution Binomial distribution Normal distribution Pre-specified vector Examples: • Randomization code generation • Simulation 1 (explore the relationship between power and effect size) • Simulation 2 (explore the relationship between power and sample size) • Simulation 3 (generate binary outcome using logistic regression model)

Uniform distribution PDF Mean: Variance:

[R] Uniform distribution runif(n, min=0, max=1) See R code …

[R] Uniform distribution Use UNIFORM distribution to generate BERNOULLI distribution Basic idea: 0 Bernoulli distribution Uniform distribution 1 See R code …

[R] Binomial distribution rbinom(n, size, prob) e.g. generate 10 Binomial random number with Binom(100, 0.6) n = 10 size = 100 prob = 0.6 rbinom(10, 100, 0.6) e.g. generate 100 Bernoulli random number with p=0.6 n = 100 size = 1 prob = 0.6 rbinom(100, 1, 0.6) See R code …

[R] Normal distribution dnorm(x, mean, sd) #density pnorm(q, mean, sd) #probability P(X<=q) qnorm(p, mean, sd) #quantile rnorm(n, mean, sd) #random number See R code …

[R] Normal distribution dnorm(x, mean, sd) #density e.g. plot a standard normal curve pnorm(q, mean, sd) #probability P(X<=x) e.g. calculate the p-value for a one sides test with standardized test statistic H0: X<=0 H1: X>0 Reject H0 if “Z” is very large If from the test, we got the Z value = 3.0, what’s the p-value? P-value = P(Z>=z) = 1 - P(Z<=z) 1 - pnorm(3, 0, 1)

[R] Normal distribution qnorm(p, mean, sd) #quantile See R code … rnorm(n, mean, sd) #random number See R code …

[R] Another useful command for sampling from a vector – “sample()” e.g. randomly choose two number from {2,4,6,8,10} with/without replacement sample(x, size, replace = FALSE, prob = NULL) sample(c(2,4,6,8,10), 2, replace = F) 4 8 2 6 10

[R] Another useful command for sampling from a vector – “sample()” e.g. A question from our THEORY I CLASS: “Draw a histogram of all possible average of 6 numbers selected from {1,2,7,8,14,20} with replacement” Answer: A quick way to solve this question is to do a simulation: That is: we assume we repeat selection of 6 balls with replacement from left urn for many many times, and plot their averages The R code is looked like: 14 20 8 a <- NULL for (i in 1:10000){ a[i] <- mean(sample(c(1,2,7,8,14,20),6, replace = T)) } hist(a) 1 2 7

[R] Another useful command for sampling from a vector – “sample()” e.g. Generate 1000 Bernoulli random number with P = 0.6 sample(x, size, replace = T, prob =) Answer: Let x = (0, 1), Let size = 1, Let replace = T/F, Let prob = (0.4, 0.6). Repeat 1000 times 0 1

Example 1 Generate randomization sequence Goal: randomize 100 patient to 2 treatment using following method: 1. Simple randomization (like flipping a coin) – Bernoulli distribution 0 0 1 0 0 1 0 1 0 0 …. 1 0 1 0 runif(), rbinom(), sample(). 2. Permuted block randomization AABB BABA BBAA BABA BAAB … BBAA Block size = 4 sample()

Example 2 Investigate the relationship between effect size and power – drug lowing BP Linear model: Y = b0 + b1X + e Y: Blood Pressure (response) X: treatment (1 = drug vs. 0 = placebo) e: random error = var(Y) b1 represent the effect size For instance, if previous study shows us that the BP in placebo population are distributed as N(100, 49), what is the power when real effect of new drug on BP reduction is 0, 1, 2, 3, 4 and 5 for study with sample size = 100 (50 in drug, 50 in placebo) b0 = 100 e ~ N(0, 49) Important information: Y (placebo) ~ N(100, 49)

Example 2 Investigate the relationship between effect size and power - drug lowing BP Y: Blood Pressure (response) X: treatment (1 = drug vs. 0 = placebo) e: random error = var(Y) Linear model: Y = b0 + b1X + e b0 = 100 e ~ N(0, 49) Important information: Y (placebo) ~ N(100, 49) We are try to answer: What’s the power given b1 (the real effect size of the treatment) is 0, 1, 2, 3, 4 or 5 Note: if we set alpha = 0.05, power means the probability of b1 (treatment effect) showing significant (P<0.05) in the linear model among the repeated simulations

Example 2 Investigate the relationship between effect size and power - drug lowing BP Y: Blood Pressure (response) X: treatment (1 = drug vs. 0 = placebo) e: random error = var(Y) Linear model: Y = b0 + b1X + e • Simulation steps (E.g. sample size = 50/ per group, 1000 simulations): • Generate X according to study design (50 “1”s and 50 “0”s); • Generate 100 “e” from N(0, 49); • Given b0 and b1, generate Y using Y = b0 + b1X + e; • Use 100 pairs of (Y, X) to refit a new linear model, and get the b0 and b1 and their p-value; • Repeat these steps for 1000 times. • If alpha = 0.05, for a two-sided test

Example 3 Investigate the relationship between sample size and power Linear model: Y = b0 + b1X + e We are try to answer: What’s the power give b1 = 2 and sample size = 25, 50, 75, 100, 125 per group

Example 4 Generate binary response in logistic regression Simple Logistic model: logit(p) = b0 + b1X where logit(p) = log(p/1-p) For a study, we know b0, b1, and x, we want to predict the Y, Y = 0 or 1. To do that, we can consider each Yi as a Bernoulli trial with e.g. if p = 0.7, that mean Y have 70% percent chance to be 1 and 30 percent chance to be 0

Example 4 Generate binary response in logistic regression For example: For simple Logistic model: logit(p) = b0 + b1X, we know b0 = 1 b1 = 0.5 X ~ N(0, 1) In simulation, we randomly choose x from N(0, 1), Say if x = 2, how to predict the binary response Y? First we need find its “P”, the probability of “1” in a Bernoulli distribution Using previous knowledge, we can generate a Y from a Bernoulli distribution with p = 0.88

Computing for Research I Spring 2011