AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

AAEC 4302ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Descriptive Statistics: Chapter 3

Univariate Statistics of Central Tendency • They focus on a single variable which has n available observations; for example, deer weight in the biological data set • Measures of central tendency attempt to measure the typical value taken by a given variable

Univariate Statistics of Central Tendency • There are three alternative statistics (i.e. formulas) to measure the central tendency of a variable: • The Mean • The Median • The Mode

Univariate Statistics of Central Tendency • The mean (or average) is the most common and useful measure of central tendency • Mean of • To calculate the mean, all of the observations (values) of X are added and the result is divided by the number of observations (mean deer weight = 61.77 Kg)

Univariate Statistics of Central Tendency • Proof: The sum of the deviations of the observations from the mean is always equal to zero:

Univariate Statistics of Central Tendency • The median value of X (Xmed) is simply the value taken by the middle observation on X after the observations have been ordered. • If there is an odd number of observations the median is unambiguous • If there is an even number of observations, there is no single middle observations • In the later case, by convention, the median is calculated by averaging out the values of the two middle observations on X: (median deer weight = (64+64)/2 = 64 Kg)

Univariate Statistics of Central Tendency • The mode is the most frequently occurring value of X, which may not be unique • Mode of X is 66

Univariate Statistics of Central Tendency • In statistics, the mean is the most common measure of the central tendency or typical value taken by a given variable, while the median and the mode are mostly neglected. • However, the median can sometimes be more useful to describe the typical value of X, since the mean is very sensitive to extreme values of X.

Univariate Statistics of Central Tendency • For example, if the 15 smallest deer weights are ignored; the mean increases markedly from 61.77 Kg to 64.0 Kg while the median only goes from 64 Kg to 65Kg • The mode may be a useful statistic in the case of a discrete variable, but not for continuous variables because each observation value is likely to be unique

Univariate Statistics ofDispersion p 45 • A measure of dispersion is a statistic (formula) that indicates how spread (i.e. disperse) the values of a given variable are • The range is a measure of dispersion given by the difference between the greatest and the smallest value of X in the n observations available • For example, in the Deer Data Set, the range is 61, the difference between the maximum weight of 93Kg and the minimum weight of 32Kg.

Univariate Statistics:Dispersion • As demonstrated before, the mean or average deviation of X from its mean is always zero (the positive and negative deviations cancel out in the summation), which makes it a useless measure of dispersion.

Univariate Statistics:Dispersion • The mean absolute deviation (MAD), calculated by: solves the “canceling out” problem.

Univariate Statistics:Dispersion • MAD in deer weight = 9.00 Kg; max absolute deer weight deviation is 93 Kg - 61.77 Kg = 31.23 Kg min absolute deer weight deviation is 32 Kg – 61.77 Kg = -29.77 Kg • It has an intuitive appeal since it represents the “typical deviation without regard to sign”

Univariate Statistics:Dispersion • An alternative way to address the canceling out problem is by squaring the deviations from the mean to obtain the mean squared deviation (MSD): • MSD=143.54

Univariate Statistics:Dispersion • Problem of squaring can be solved by taking the square root of the MSD to obtain the root mean squared deviation (RMSD): = 11.98 • When calculating the RMSD, the squaring of the deviations gives a greater importance to the deviations that are larger in absolute value, which may or may not be desirable

Univariate Statistics:Dispersion • For statistical reasons, it turns out that a slight variation of the RMSD, known as the standard deviation (S or SX), is more desirable as a measure of dispersion. = 12.01 (3.6)

Univariate Statistics ofDispersion p 46 • n-1 is known as the degrees of freedom in calculating SX: Intuitively, once is known, only n-1 observation values are free to vary, one is predetermined by • When a sample of data is taken to learn about the population from which it is drawn, SX is often the best estimate of the degree of dispersion of the data in the population

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH