Types of variables

Types of variables • Continuous • can take on any value within a range (height, yield, etc.) • measurements are approximate • often normally distributed • Discrete • only certain values are possible (e.g., counts, scores) • not normally distributed, but means may be • Categorical • qualitative; no natural order • often called classification variables • generally interested in frequencies of individuals in each class • binomial and multinomial distributions are common

Rounding and Reporting Numbers To reduce measurement error: • Standardize the way that you collect data and try to be as consistent as possible • Actual measurements are better than subjective readings • Minimize the necessity to recopy original data • Avoid “rekeying” data for electronic data processing • Most software has ways of “importing” data files so that you don’t have to manually enter the data again • When collecting data - examine out-of-line figures immediately and recheck

Significant Digits • Round means to the decimal place corresponding to 1/10th of the standard error (ASA recommendation) • Take measurements to the same, or greater level of precision • Maintain precision in calculations If the standard error of a mean is 6.96 grams, then 6.96/10 = 0.696  round means to the nearest 1/10th gram for example, 74.263  74.3 But if the standard error of a mean is 25.6 grams, then 25.6/10 = 2.56 round means to the closest gram for example, 74.263  74

Rounding in ANOVA • In doing an ANOVA, it is best to carry the full number of figures obtained from the uncorrected sum of squares If, for example, the original data contain one decimal, the sum of squares will contain two places 2.2 * 2.2 = 4.84 • Do not round closer than this until reporting final results

A B D A A B D C C D B C C D B A B A D C B A D C Experimental Design • An Experimental Design is a plan for the assignment of the treatments to the plots in the experiment • Designs differ primarily in the way the plots are grouped before the treatments are applied • How much restriction is imposed on the random assignment of treatments to the plots

Why do I need a design? • To provide an estimate of experimental error • To increase precision (blocking) • To provide information needed to perform tests of significance and construct interval estimates • To facilitate the application of treatments - particularly cultural operations

Factors to be Considered • Physical and topographic features • Soil variability • Number and nature of treatments • Experimental material (crop, animal, pathogen, etc.) • Duration of the experiment • Machinery to be used • Size of the difference to be detected • Significance level to be used • Experimental resources • Cost (money, time, personnel)

Cardinal Rule: • Choose the simplest experimental design that will give the required precision within the limits of the available resources

Completely Randomized Design (CRD) • Simplest and least restrictive • Every plot is equally likely to be assigned to any treatment A B D A C D B C B A D C

Advantages of a CRD • Flexibility • Any number of treatments and any number of replications • Don’t have to have the same number of replications per treatment (but more efficient if you do) • Simple statistical analysis • Even if you have unequal replication • Missing plots do not complicate the analysis • Maximum error degrees of freedom

A B D A C D B C B A D C Disadvantage of CRD • Low precision if the plots are not uniform

Uses for the CRD • If the experimental site is relatively uniform • If a large fraction of the plots may not respond or may be lost • If the number of plots is limited

Design Construction • No restriction on the assignment of treatments to the plots • Each treatment is equally likely to be assigned to any plot • Should use some sort of mechanical procedure to prevent personal bias • Assignment of random numbers may be by: • lot (draw a number ) • computer assignment • using a random number table

12 1 2 3 4 5 6 7 8 9 10 11 12 6 5 1 Random Assignment by Lot • We have an experiment to test three varieties: the top line from Oregon, Washington, and Idaho to find which grows best in our area ----- t=3, r=4 A A A A

Random Assignment by Computer (Excel) • In Excel, type 1 in cell A1, 2 in A2. Block cells A1 and A2. Use the ‘fill handle’ to drag down through A12 - or through the number of total plots in your experiment. • In cell B1, type = RAND(); copy cell B1 and paste to cells B2 through B12 - or Bn. • Block cells B1 - B12 or Bn, Copy; From Edit menu choose Paste special and select values (otherwise the values of the random numbers will continue to change)

Random numbers in Excel (cont’d.) • Sort columns A and B (A1..B12) by column B • Assign the first treatment to the first r (4) cells in column C, the second treatment to the second r (4) cells, etc. • Re-sort columns A B C by A if desired. (A1..C12)

The Statistical Analysis • Partitions the total variation in the data into components associated with sources of variation • For a Completely Randomized Design (CRD) • Treatments --- Error • For a Randomized Complete Block Design (RBD) • Treatments --- Blocks --- Error • Provides an estimate of experimental error (s2) • Used to construct interval estimates and significance tests • Provides a way to test the significance of variance sources

mean Yij =  + i + ij observation random error treatment effect Analysis of Variance (ANOVA) Assumptions • The error terms are… randomly, independently, and normallydistributed, with a mean of zero and a common variance. • The main effects are additive Linear additive model for a Completely Randomized Design (CRD)

The CRD Analysis We can: • Estimate the treatment means • Estimate the standard error of a treatment mean • Test the significance of differences among the treatment means

SiSj Yij=Y.. What? • i represents the treatment number (varies from 1 to t=3) • j represents the replication number (varies from 1 to r=4) • S is the symbol for summation Treatment (i) Replication (j) Observation (Yij) 1 1 47.9 1 2 50.6 1 3 43.5 1 4 42.6 2 1 62.8 2 2 50.9 2 3 61.8 2 4 49.1 3 1 66.4 3 2 60.6 3 3 64.0 3 4 64.0

grand mean mean of the i-th treatment deviation of the i-th treatment mean from the grand mean The CRD Analysis - How To: • Set up a table of observations and compute the treatment means and deviations

The CRD Analysis, cont’d. • Separate sources of variation • Variation between treatments • Variation within treatments (error) • Compute degrees of freedom (df) • 1 less than the number of observations • total df = N-1 • treatment df = t-1 • error df = N-t or t(r-1) if each treatment has the same r

Skeleton ANOVA for CRD

The CRD Analysis, cont’d. • Compute Sums of Squares • Total • Treatment • Error SSE = SSTot - SST • Compute Mean Squares • Treatment MST = SST / (t-1) • Error MSE = SSE / (N-t) • Calculate F statistic for treatments • FT = MST/MSE

Using the ANOVA • Use FT to judge whether treatment means differ significantly • If FT is greater than F in the table, then differences are significant • MSE = s2 or the sample estimate of the experimental error • Used to compute standard errors and interval estimates • Standard Error of a treatment mean • Standard Error of the difference between two means

Numerical Example • A set of on-farm demonstration plots were located throughout an agricultural district. A single plot was located within a lentil field on each of 20 farms in the district. • Each plot was fertilized and treated to control weevils and weeds. • A portion of each plot was harvested for yield and the farms were classified by soil type. • A CRD analysis was used to see if there were yield differences due to soil type.

1 2 3 4 5 42.2 28.4 18.8 41.5 33.0 34.9 28.0 19.5 36.3 26.0 29.7 22.8 13.1 31.7 30.6 18.5 10.1 31.0 19.4 28.2 Mean Mean 35.600 23.420 15.375 33.740 29.867 27.185 ri 3 5 4 5 3 20 Dev 8.415 -3.765 -11.810 6.555 2.682 Dev2 70.812 14.175 139.476 42.968 7.191 Table of Observations, Means, and Deviations

Source df SS MS F Total 19 1,439.2055 Soil Type 4 1,077.6313 269.4078 11.18** Error 15 361.5742 24.1049 ANOVA Table Fcritical(α=0.05; 4,15 df) = 3.06 ** Significant at the 1% level

Formulae and Computations Coefficient of Variation Standard Error of a Mean Confidence Interval Estimate of a Mean (soil type 4)

Formulae for Mean Comparisons Standard Error of the Difference between Two Means (for soils 1 and 2) Test statistic with N-t df

Mean Yields and Standard Errors Soil Type 1 2 3 4 5 Mean Yield 35.60 23.42 15.38 33.74 29.87 Replications 3 5 4 5 3 Standard error 2.83 2.20 2.45 2.20 2.83 CV = 18.1% 95% confidence interval estimate for soil type 4 = 33.74 4.69 Standard error of difference between 1 and 2 = 3.58

1 2 4 5 3 Report of Analysis • Analysis of yield data indicates highly significant differences in yield among the five soil types • Soil type 1 produces the highest yield of lentil seed, though not significantly different from type 4 • Soil type 3 is clearly inferior to the others

Types of variables