1 / 16

Geo479/579: Geostatistics Ch15. Cross Validation

Geo479/579: Geostatistics Ch15. Cross Validation. Why is Cross Validation Useful?. Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set

chaz
Download Presentation

Geo479/579: Geostatistics Ch15. Cross Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geo479/579: GeostatisticsCh15. Cross Validation

  2. Why is Cross Validation Useful? • Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set • CV may help us to choose between different weighting procedures, search strategies, variogram models, or estimation methods

  3. Why is Cross Validation Useful.. • In practice, CV results are often used simply to compare the distribution of the estimation errors or residuals from different estimation procedures and choose the one that works better • A careful study of the spatial distribution of cross validated residuals (estimated minus true values) can provide insights into where an estimation procedure may run into trouble

  4. Cross Validation Method • The sample value at a particular location is temporarily removed from the sample data set

  5. Cross Validation Method.. • The value at the same location is then estimated using the remaining samples • Once the estimation is calculated we can compare it to the true sample value that was initially removed from the sample data set • This procedure is repeated for all available samples

  6. CV as a Quantitative Tool • Table 15.2 shows that kriging is better because the estimation errors from ordinary kriging have a mean closer to 0 and have less spread

  7. CV as a Quantitative Tool.. Smooth Effect !!!

  8. CV as a Quantitative Tool.. • One of the factors that limits the conclusions that can legitimately be drawn from a cross validation exercise is recurring problem of clustering • =>If our original sample data set is spatially clustered, then so, too, are our cross validated residuals. Therefore, some conclusions drawn from it may be applicable to the entire map area, others may not

  9. CV as a Qualitative Tool • Figure 15.4 shows a map of the ordinary kriging residuals from the cross validation study. A “+” symbol indicates an overestimation, and a “-“symbol for underestimation. • We prefer them to be conditionally unbiased with respect to their location. On this type of display we hope to see the “+” and “-“ symbols are mixed.

  10. Type 1 and Type 2 Samples • These are two values of an indicator variable, T. This variable is explained on p4-6. Its statistical and spatial distribution is displayed on p73-75

  11. CV as a Qualitative Tool.. • In Figure 15.4 there is a fairly large patch of positive residuals around 110E, 180N • Most of the samples in this area are type 1 samples (type 1: T=1; type 2: T=2), so we need to consider how the ordinary kriging approach performs for the other type 1 samples

  12. CV as a Qualitative Tool.. • We focus on type 1 because of the specific goal. To improve the estimation, we expand the 25m search radius to 30m. The residuals were improved and shown in Figure 15.6 • CV can also bring frustration since it often reveals problems that do not have straightforward solutions

  13. CV as a Goal- Oriented Tool • Imagine the Walker Lake data set is an ore deposit, suppose that economic cutoff is 300 ppm; material with an ore grade of greater than 300 ppm will be classified as ore. Material less than 300 ppm will be classified as waste. • Figure 15.7: There are two types of misclassification False Negative Error Ore False Positive Error Waste

  14. CV as a Goal- Oriented Tool.. • For applications in which misclassification has important consequences, the minimization of the misclassification may be a much more relevant criterion than the various statistical criteria • The magnitude of misclassification is less important than the misclassification itself

  15. Limitations of Cross Validation • CV can generate pairs of true and estimated values only at sample locations • Clustering problem in the sample data set • In practice, the residuals may be more representative of only certain regions or particular ranges of values

  16. Limitations of Cross Validation.. • Clustering problem can be overcome either by calculating declustered mean of residuals or by performing CV at a selected subset of locations that is representative of the entire study area • If very close nearby samples are not available in the actual estimation, it makes little sense to include them in CV • The problem areas identified by cross validation may warrant additional sampling, especially when there are major consequences

More Related