360 likes | 561 Views
Parametric Distributions. Definitions. A random variable , X is a map from the result of an experiment or observation to the real numbers. The cumulative distribution function of a random variable is defined through the probability measure as F X (z)=P(X ≤ z). This is often written F(z).
E N D
Definitions • A random variable,X is a map from the result of an experiment or observation to the real numbers. • The cumulativedistribution function of a random variableis defined through the probability measure as FX(z)=P(X≤z). • This is often written F(z).
Properties of F • F() is non-decreasing. • F() vanishes to 0 on lhs and increases to 1 on rhs. • Note that F() is right continuous. • For any such F(), a random variable can be created. (Skorokhod Representation)
pdf • Where F(x) can be written as the integral from minus infinity to x of some function, f(z), • Then f(z) is termed a density (or pdf). • Where this is expressible as a discrete sum, the discrete function f(j) is also termed a pdf. • A pdf will tell us which values of the R.V. are most likely.
Important Note • Thus, this idea is very general. • Lots of F()s are possible. • A closed functional form for F() and f() is not required. • Exercise: • Draw some ‘possible’ cdfs. • Check that they fulfil the conditions. • A note about empirical cdfs.
Example • A lecturer is thinking of doing building work in his house, but is waiting to hear about the profits from a venture he was involved in before deciding whether to proceed. • He knows he will get at least €10,000 net from the project. • Things are going well, and it is likely that the actual returns will be around €20,000. • There is an outside chance that €40,000 could be returned, but this is unlikely.
Example II • A lecturer takes about 22 minutes to cycle to work. • On a good day, and pedaling hard, he can make it in 15 minutes. The fastest he has done it is 12 minutes. • It would take 90 minutes to walk, so this is a realistic upper bound for cycling.
Example III • A (male) Senior Sophister management science student is interested in ‘meeting’ with incoming (female) JF students in ESS throughout the year. • Previous experience tells him his ‘success’ rate is about 1 in 10. • Around 50 opportunities present themselves a year. • Summarise the annual promiscuity.
Parametric Forms • Over the years, mathematicians have examined functions that have the properties described. • Many of these have arisen through considering combinations of other simple functions. • These functions have parameters, which can be modified to change the shape of the curve. • However, the overall functional form stays the same.
Advantages Parametric Dists • Properties and behaviours well understood. • Moments can readily be calculated. • Black box software available. • Can readily communicate models to colleagues. • Sufficiently flexible for most purposes. • As realistic as empirical functions and may be more physically justifiable.
Disadvantages • May not exactly match application (ease of use vs tool availability compromise.) • Results may be sensitive to distributional assumptions. • Sometimes easy to program without a full understanding of what is going on – downside of black box.
Some Models • Bernoulli - Br(x|q) - dbinom(,,1) • Binomial - Bi(x|q,n) - dbinom() • Poisson - Pn(x|l) - dpois() • Beta - Be(x|a,b) - dbeta() • Uniform - Un(x|a,b) - dunif() • Gamma - Ga(x|a,b) - dgamma() • Exponential Ex(x|q) - dexp() • Normal - N(x|m,s) - dnorm()
Binomial • Bi(x|q,n) • Pdf f(x) = • nCxqx(1-q)(n-x) • E(x) = nq • Var(x) =nq(1-q) • Graph for n=9 and q=0.5.
Binomial • Cdf • This is a step function, since can only have integer values.
Normal (Gaussian) • N(x|m,s) • Pdf f(x) = • cexp{-0.5 s-2(x-m)2} • E(x) = m • Var(x) =s2 • Graph for m=0 and s=1.0.
Normal • Cdf • This is smooth since the underlying rv is continuous. • Note that neither 0 nor 1 is reached in the plotted region.
Choosing Models • Thus, for example, if one is interested in a smoothly varying quantity, such as response rate, then one might consider ‘modeling’ it using a Normal distribution. • If an ‘expert’ tells you that response rate is likely to be around 7%, but could go from 5% to 9%, neither of which is very likely, what values of parameters for a Normal model might represent this ‘belief’?
Aside – Using ‘R’ • A handy piece of statistical software that is known to be well programmed, and is good for plots etc is ‘R’. • This is an open source implementation of S-Plus. • Some short input on the use of ‘R’ is worthwhile.
R • Access via web page - also on lab machines. • Command line interface. x<-(1:1000)/200 y<-dgamma(x,2,2) plot(x,y,type="l",col=1) • Sets up vector, x, taking sequential values between 0 and 5. • Sets up y to be the pdf of x. • Plots y as a function of x, as a line plot, in black.
Issues • What if the ‘belief’ says that high response rates are more likely than low ones (skew)? • Can you draw a density that might match? • What if there is likely to be a response rate of around 6%, but if by chance a marketing stunt that is being run next week gets air time on radio, then the rate will be around 10%?
Exercise • Write down a pdf for • Skewed distribution • Truncated distribution • Mixture of distributions • Show (in outline) that there exists a random variable, which has as its pdf the quantity that you have written down.
Gamma Distribution • Ga(x|a,b) – shape, a and rate, b • Pdf f(x) = • c x(a –1)exp(-bx) • E(x) = a/ b • Var(x) = a/(b2) • Graph a=2, b=2
Use in modeling • Thus, instead of fixing deterministic aspects of the model, we can allow inputs to be defined by parametric distributions. • We still need to fix the parameters of the distributions, but this may be much more realistic than fixing values. • Elicitation is the term given to the assignment of parameters based on ‘expert opinion.’
Method • Thus, we have the following method at the modeling step; • Determine a ‘realistic’ model for the situation (conditional on particular values of inputs.) • Examine which inputs have the biggest impact on the output variable of main interest. • Model the uncertainty of the inputs through a probability distribution. • Examine the impact on outputs.
Practicalities • This can be done by; • Examining the moments of the combinations of random variables. • Analytically (gives exact answer, but messy.) • Simulation from the distributions of interest.
Simulation • In order to ‘simulate’ values from the distribution of interest we need a system of generating random numbers. • It suffices to be able to generate numbers from a uniform[0,1). • Prove that if this can be done, then any random variable can be simulated. • Example: Normsinv(Rand())
Exercise • Examine each of the distributions listed earlier in lectures. • For each one, you should produce a pdf and cdf for various parameters of interest. • These graphs can readily be constructed in R.
Exercise II • For the Norseman problem, examine the impact of a response rate which is unknown, but apriori believed to be Normal, with mean 6% and standard deviation 0.6% . • Additionally, you might consider the impact of Gamma distributed orders, with shape 10 and rate 12.