380 likes | 577 Views
Elementary statistics for foresters. Lecture 3 Socrates/Erasmus Program @ WAU Spring semester 2005/2006. Statistical distributions. Statistical distributions. Empirical distributions Why distributions? Variable types Sample theoretical distributions Normal distribution
E N D
Elementary statistics for foresters Lecture 3 Socrates/Erasmus Program @ WAU Spring semester 2005/2006
Statistical distributions • Empirical distributions • Why distributions? • Variable types • Sample theoretical distributions • Normal distribution • Binomial distribution
Empirical distributions • Graphical representation of the data in a form of frequency distribution, histogram, polygon, etc.
Why distributions? • In some cases it is necessary to formulate hypotheses about the specific distribution of the investigated variable. • For example, we can think of a wood density as following the normal distribution, and use this information for modeling and inferential statistics purposes.
Why distributions? • When using distributions for predictive purposes it is often desirable to understand the shape of the underlying distribution of the population. • To determine this distribution, it is common to fit the observed distribution to a theoretical distribution by comparing the observed frequencies to the expected frequencies of the theoretical distribution.
Why distribution? • To do this, maximum likelihood method or the method of moments are used. • Another common application of theoretical distributions is to be able to verify the assumption of normality before using some parametric test.
Variable types • Variables can be qualitative (which means: describing belonging to a group or category, eg. sex, hair color, tree species), and quantitative (which means: possible to measure using a numerical scale, or numeric values for which addition and averaging make sense, eg. DBH, height, crown ratio, ...).
Variable and distribution types • If variables can take only a finite set of values, we are talking about discrete variables (eg. age, DBH class, ...), and about probability distribution. • If variables can take any value (or any value from a given interval), we are talking about continuous variables (eg. height, DBH, ...), and probability density.
Variable and distribution types • In many cases, due to measurement limitations or simplifications, continuous variables can be treated as discrete (eg., when DBH measured as rounded to 1mm).
Sample distributions • Beta distribution is used to model the distribution of order statistics, and to representing processes with natural lower and upper limits. • binomial distribution is used for describing binomial events, such as the number of M/F in a random sample, or the number of defective components in samples of n units taken from a production process.
Sample distributions • chi-square distribution is most frequently used in modeling random variables representing frequencies. • exponential distribution is frequently used to model the time interval between successive random events. • logistic distribution is used to model binary responses.
Sample distributions • normal distribution is a theoretical function commonly used in inferential statistics as an approximation to sampling distributions. • Poisson distribution is used to model rare events. • Weibull distribution is often used as a model of failure time or in reliability testing. • ...
Normal distribution • The most frequently used distribution in statistics • The basic assumption of many statistical methods, such as estimation, hypotheses testing, regression and correlation, analysis of variance, ...
Normal distribution • Usually variables whose values are determined by an infinite number of independent random events will be distributed following the normal distribution. • The normal distribution is an example of the distribution of continuous variables. Its probability density function can be described as following:
Normal distribution • where: • x is a variable of interest • µ is an arithmetic mean • σ is standard deviation
Normal distribution properties: • the probablility density function rises for x<µ, and lowers for x>µ • the probability density function has its maximum at x = µ • the expected value of the X variable E(X)=µ • variance of the X variable: D2X = σ2
Normal distribution properties • at x = µ the probability density function has a value of • the distribution has 2 inflection points (the function changes from concavitate to convexitate or from convexitate to concavitate) for x=µ - σ and x = µ + σ • the normal distribution is symmetric, and the symmetry axe is defined as x = µ
Normal distribution properties: • if variance/standard deviation is low, the probability density function is narrower • the probablity function of the normal distribution is an integral of the probability density function
Standarized normal distribution • Every normal distribution can be normalized, i.e. can be written as the distribution with mean equal 0 and standard deviation equal 1: N(0,1). • The expected value of the standarized normal distribution equals zero (EZ = 0) and its variance equals 1 (D2Z = 1).
Standarized normal distribution • The standarization process is nothing else but changing variable x to z, where: • The probability density function of such a distribution is:
Normal distribution properties: • Between µ - σ and µ + σ about 68% of all variable values occur • In the interval from μ - 2*σ to μ + 2*σ are about 95% of all values of the variable • In the interval from μ - 3*σ to μ + 3*σ are about 99,7% of all observations
Binomial distribution • Example of the probability distribution • Describes the probability of getting k number of successes in n independently repeated samples, where probability of a success in just one sample equals p
Binomial distribution properties • the graph of the distribution is symmetric for p = 0.5 • for p < 0.5 the distribution is positively skewed • for p > 0.5 is negatively skewed
Binomial distribution properties • Expected value E(X) = n * p • Variance D2X = n p q • Standard deviation • Sample exercises using the binomial distribution