260 likes | 269 Views
This study aims to determine the optimum sample size for stratified random sampling in a pond, where samples are collected to measure pollution levels. Three approaches are considered: pre-specified margin of error, pre-specified fixed cost, and correlation structure among the strata. The Bayesian methodology is used to estimate the sample sizes. The results show that the Bayesian analysis leads to a reduction in required samples and does not significantly impact the accuracy of the estimates.
E N D
SAMPLE SIZE REQUIREMENTS FOR STRATIFIED RANDOM SAMPLING OF AGRICULTURAL RUN OFF POLLUTANTS IN POND WATER WITH COST CONSIDERATIONS USINGA BAYESIAN METHODOLOGYA.A. BartolucciDepartment of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama 35294-0022 USAS. Bae and K.P. SinghDepartment of Biostatistics, School of Public Health, University of North Texas Health Science Center at Forth Worth, Forth Worth, Texas 76107-2699 USA
GOAL: USING A BAYESIAN APPROACH WE WISH TO DETERMINE THE OPTIMUM SAMPLE SIZE , n, AND SAMPLE SIZE, nh , FOR SAMPLING WITHIN STRATUM, h, WHERE h=1,2,......L AND n=n1+n2+........+nL . THE STRATA ARE BASICALLY DEPTH LEVELS IN A POND. SAMPLING IS TO DETERMINE THE AMOUNT OF POLLUTION IN THE POND .
Three Approaches Pre Specified Margin of Error (PMOE) Pre Specified Fixed Cost Correlation Structure Among the Strata
TRADITIONAL SETUP N=Total number of population units in the target population. For L strata, h=1,........L , . n= total number of sampling units in the target sample. n = n1 +n2 +......nL =
Weight of stratum h, Wh =Nh / N. The mean, μ, of the population of n units: Estimateuh by : where xhi=ith observation in stratum h.
An unbiased estimator of μ is : Let Nh / N = nh / n in all strata, then
Variance: Var(mst) = Estimate the stratum variance, σ2h , by It can be shown that for large N,
Optimum n: Using Prespecified Margin of Error PMOE Let d= pre specified margin of error, i.e. d=|mst -μ| that can be tolerated and a small probability, α, of exceeding that error. i.e. P( |mst -μ|d) = α. Then by Cochran an optimum n is:
For N, Thus the optimal nh for each h is: For our example we let d=0.2 and α =0.10. . For our example we let d=0.2 and α =0.10.
Optimum n: Using Prespecified Fixed Cost wherech is the cost per population unit in the hth stratum and c0 is the fixed overhead cost. Thus the optimum n is:
As above, the optimum nh per stratum is: Our examples will reflect both conditions of prespecified margin of error and prespecified fixed cost.
Correlation Among the Depth Strata Let ρc = the average correlation among the depth strata, i.e. average of all possible pairwise correlations. Let ns = number of strata. ns=L. Let nh = the number to be sampled in each of the L strata or nh =stratum size. Thus:
Bayesian Considerations Derivation of the posterior variance using the Bayesian approach to the solution of the Behren’s Fisher problem for inference on mean (μ) and variance (σ) of the normal distribution when both paramters are unknown. Likelihood function for n observations: υ=n-1, nm=x1 + x2 +......xn , υs2 = (x1-m)2 +(x2-m)2+.....+ (xn-m)2.
Consider the t-density: φ(x;s2) = s-1[υ1/2Beta(υ/2,1/2)]-1(1+υ-1(x/s)2)-(υ+1)/2 where The prior for the mean, μ, is: normal for υo.
Prior: p(2) g2/ 2 Let B=υ+τ. The posterior variance is: ε2 = (υs2+τg2)/B. Thus substitute ε2h for s2h in the above computations of n and nh.
Example: Estimate the average phosphorous concentration (μg/100ml) in Pond water. The phosphorous concentration of a 100-ml aliquot from each 1-Liter sample will be measured. N=total number of 100-ml water samples in the pond. Nh =number of aliquots in stratum h. There are five strata of depth levels, h=1,2,....5
Table 1.Data for stratified random sampling to estimate samples per strata (PMOE) Classical Approach (υ =1, τ =0, g =1) s2(mst) = 0.0140, Cost=74 (For cost per strata please see next slide)
The unit cost to sample each depth level is: The assumption being that the cost is higher at greater depths.
Table 3. Pre specified fixed cost (Bayesian results in bottom row)
Table 4. Example Using the Correlation Structure, ρc.
Conclusions: Compared to the classical sampling analysis for the pre specified margin of error approach as well as the correlational approach, the Bayesian analysis resulted in: 1. Reduction in required number of samples thus lowering the cost , especially when realistic (empirical) prior hyperparameters are utilized. 2. No serious adverse impact on standard error of the estimates of the mean concentration.
- There were no real differences between classical and Bayesian approaches in the pre specified fixed cost analysis - Given current computational tools the Bayesian calculations proved to be fairly straight forward. - Given the current availability of databases, future Bayesian approaches to environmental sampling should be given serious consideration.