180 likes | 502 Views
Geo479/579: Geostatistics Ch15. Cross Validation. Why is Cross Validation Useful?. Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set
E N D
Why is Cross Validation Useful? • Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set • CV may help us to choose between different weighting procedures, search strategies, variogram models, or estimation methods
Why is Cross Validation Useful.. • In practice, CV results are often used simply to compare the distribution of the estimation errors or residuals from different estimation procedures and choose the one that works better • A careful study of the spatial distribution of cross validated residuals (estimated minus true values) can provide insights into where an estimation procedure may run into trouble
Cross Validation Method • The sample value at a particular location is temporarily removed from the sample data set
Cross Validation Method.. • The value at the same location is then estimated using the remaining samples • Once the estimation is calculated we can compare it to the true sample value that was initially removed from the sample data set • This procedure is repeated for all available samples
CV as a Quantitative Tool • Table 15.2 shows that kriging is better because the estimation errors from ordinary kriging have a mean closer to 0 and have less spread
CV as a Quantitative Tool.. Smooth Effect !!!
CV as a Quantitative Tool.. • One of the factors that limits the conclusions that can legitimately be drawn from a cross validation exercise is recurring problem of clustering • =>If our original sample data set is spatially clustered, then so, too, are our cross validated residuals. Therefore, some conclusions drawn from it may be applicable to the entire map area, others may not
CV as a Qualitative Tool • Figure 15.4 shows a map of the ordinary kriging residuals from the cross validation study. A “+” symbol indicates an overestimation, and a “-“symbol for underestimation. • We prefer them to be conditionally unbiased with respect to their location. On this type of display we hope to see the “+” and “-“ symbols are mixed.
Type 1 and Type 2 Samples • These are two values of an indicator variable, T. This variable is explained on p4-6. Its statistical and spatial distribution is displayed on p73-75
CV as a Qualitative Tool.. • In Figure 15.4 there is a fairly large patch of positive residuals around 110E, 180N • Most of the samples in this area are type 1 samples (type 1: T=1; type 2: T=2), so we need to consider how the ordinary kriging approach performs for the other type 1 samples
CV as a Qualitative Tool.. • We focus on type 1 because of the specific goal. To improve the estimation, we expand the 25m search radius to 30m. The residuals were improved and shown in Figure 15.6 • CV can also bring frustration since it often reveals problems that do not have straightforward solutions
CV as a Goal- Oriented Tool • Imagine the Walker Lake data set is an ore deposit, suppose that economic cutoff is 300 ppm; material with an ore grade of greater than 300 ppm will be classified as ore. Material less than 300 ppm will be classified as waste. • Figure 15.7: There are two types of misclassification False Negative Error Ore False Positive Error Waste
CV as a Goal- Oriented Tool.. • For applications in which misclassification has important consequences, the minimization of the misclassification may be a much more relevant criterion than the various statistical criteria • The magnitude of misclassification is less important than the misclassification itself
Limitations of Cross Validation • CV can generate pairs of true and estimated values only at sample locations • Clustering problem in the sample data set • In practice, the residuals may be more representative of only certain regions or particular ranges of values
Limitations of Cross Validation.. • Clustering problem can be overcome either by calculating declustered mean of residuals or by performing CV at a selected subset of locations that is representative of the entire study area • If very close nearby samples are not available in the actual estimation, it makes little sense to include them in CV • The problem areas identified by cross validation may warrant additional sampling, especially when there are major consequences