340 likes | 428 Views
A Probabilistic-Spatial Approach to the Quality Control of Climate Observations. Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George Taylor Spatial Climate Analysis Service Oregon State University Corvallis, Oregon, USA.
E N D
A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George Taylor Spatial Climate Analysis Service Oregon State University Corvallis, Oregon, USA
Traditional QC Systems are Categorical and Deterministic • Data subjected to categorical quality checks • Designed to uncover mistakes • Validity determined from test results • Mistake = flag / toss • No mistake = no flag / keep Designed to Work With Human Observing Systems
Alien Electronic Devices are Invading the Climate Observing World! They’re Everywhere! They’re Everywhere!
Electronic Sensors and Modern Applications Create Challenges for Traditional QC Systems • Errors tend to be continuous drift, rather than categorical mistakes • Increasing usage of computer applications that rely on climate observations Situation Need • Continuous estimates, rather than categorical tests, of observation validity • Quantitative estimates of observational uncertainty, not just flags
More Challenges… • Range of applications is increasingly rapidly, and each has a difference tolerance for outliers • Data are often more voluminous and disseminated in a more timely manner Situation Need • Probabilistic information from which a decision to use an obs can be made, not up-front decision • Automated QC methods
An Opportunity Advances in climate mapping technology now make it possible to estimate a reasonably accurate “expected value” for an observation based on surrounding stations. Assumption: Spatial consistency is related to observation validity
Useful Characteristics for a Next-Generation Climate QC System continuous quantitative probabilistic automated spatial
PRISM Probabilistic-Spatial QC (PSQC) System for SNOTEL Data Uses climate mapping technology and climate statistics to provide a continuous, quantitative confidence probability for each observation, estimate a replacement value, and provide a confidence interval for that replacement. • Start with daily max/min temperature for all SNOTEL sites, period of record • Move to precipitation, SWE, soil temperature and moisture • Develop automated system for near-real time operation at NRCS
Climatological Grid Development 4 km • PRISM must produce a high-quality estimate of temperature at each SNOTEL station each day • Highest interpolation skill obtained by using a high-quality predictive grid that represents the long-term climatological temperature for that day, rather than a digital elevation grid • Climatological grid: 0.8 km resolution, 1971-2000 0.8 km
Oregon Annual Precipitation Leveraging Information Content of High-Quality Climatologies to Create New Maps with Fewer Data and Less Effort Climatology used in place of DEM as PRISM predictor grid
PRISM Regression of “Weather vs Climate” 20 July 2000 Tmax vs 1971-2000 Mean July Tmax
PRISM Parameter-elevation Regressions on Independent Slopes Model • Generates gridded estimates of climatic parameters • Moving-window regression of climate vs. elevation for each grid cell • Uses nearby station observations • Spatial climate knowledge base (KBS) weights stations in the regression function by their climatological similarity to the target grid cell
PRISM Parameter-elevation Regressions on Independent Slopes Model PRISM KBS accounts for spatial variations in climate due to: • Elevation • Terrain orientation • Terrain steepness • Moisture regime • Coastal proximity • Inversion layer • Long-term climate patterns
PRISM Moving-Window Regression Function 1961-90 Mean April Precipitation, Qin Ling Mountains, China Weighted linear regression
Rain Shadows: 1961-90 Mean Annual Precipitation Oregon Cascades Portland Eugene Mt. Hood Dominant PRISM KBS Components Elevation Terrain orientation Terrain steepness Moisture Regime Mt. Jefferson 2500 mm/yr 2200 mm/yr 350 mm/yr Three Sisters Sisters Redmond Bend N
Coastal Effects: 1971-00 July Maximum Temperature Central California Coast Sacramento Stockton Dominant PRISM KBS Components Elevation Coastal Proximity Inversion Layer 34° SanFrancisco Oakland Fremont SanJose Preferred Trajectories Santa Cruz 27° 20° Pacific Ocean Hollister Monterey Salinas N
Inversions – 1971-00 July Minimum Temperature Northwestern California N Pacific Ocean Willits 9° Dominant PRISM KBS Components Elevation Inversion Layer Topographic Index Coastal Proximity Ukiah Lake Pilsbury. 10° 17° 16° Cloverdale Lakeport 12° Clear Lake 17°
PRISM PSQC SystemConfidence Probability (CP) Definition of CP: Given the difference between an observation and an expected value (residual), CP is the probability that another observation and expected value from the same time of year would differ by at least as much Residual distribution +/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)
Confidence Probability Takes into Account Uncertainty in the System Low Overall Skill High Overall Skill X = Residual (P-O) P-value is higher for a given deviation from the mean when Sx is large (low skill)
Interpreting Confidence Probability Continuous values from 0 – 100% 0% = highly spatially inconsistent observation, reflected in a PRISM prediction that is unusuallydifferent than the observation 100% = highly consistent observation, reflected in a PRISM prediction that is relatively close to the observation Guidelines to date CP > 30: Use observation as-is 10 < CP < 30: Blend prediction and observation CP < 10: Use prediction instead of observation
PRISM PSQC Process1. CreateDatabase Records Goal: Enter daily tmax/tmin observations for all networks into database and prepare data Current Actions: • Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, ASOS, and first-order networks. • Shift AM COOP observations of tmax to previous day (assumes standard diurnal curve, which does not always apply). • Convert units to degrees Celsius.
PRISM PSQC Process2. Single-Station Checks Goal: Take all QC actions possible at the single-station level, before entering the spatial QC process. Current Checks: • Temperature observation is well above the all-time record maximum or well below the all-time record minimum for the state – flag set and CP set to 0 • Maximum temperature is less than the minimum temperature – flag set and CP set to 0 • First daily tmax/tmin observation after a period of missing data – flag set and CP set to 0 (COOP only?) • More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP set to 0 • 5-10 consecutive observations with the same value is a potential flatliner, to be assessed by the spatial QC system – flag set and CP unchanged
PRISM PSQC Process3. Spatial QC System Goal:Through a series of iterations, gradually and systematically “weed out” spatially inconsistent observations from consistent ones Overview: • PRISM is run for each station location for each day, and summary statistics are accumulated • Once all days have been run, frequency distributions are developed and confidence probabilities (CP) for each daily station observation are estimated • These CP values are used to weight the daily observations in a second iteration of PRISM daily runs • Obs with lower CP values are given lower weight, and thus have less influence, in the second set of PRISM predictions, and are also given lower weight in the calculation of the second set of summary statistics • CP values are again calculated and passed back to the daily PRISM runs • This iterative process continues for about 5 iterations, at which time the CP values have reached equilibrium
QC Iteration For each station-day: • Run PRISM for each station location in its absence, estimating its obs for each day • PRISM omits nearby stations, singly, and in pairs, to try to better match observation • Prediction closest to obs is accepted • Raw PRISM variables: Observation (O), Prediction (P), Residual (R=P-O), PRISM Regression Standard Deviation (S) Once all station-days are run: • Calculate summary statistics for each station for each day • Mean and std dev of O (Os), P (Ps), R (Rs), and S (Ss) • +/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155) • 5-day running Standard Deviation (RunSD) as a measure of day-to-day variability (time shifting) • Potential flatliners: calculate V, the ratio of station’s RunSD (set to 0.3) to that of surrounding stations • Determine “effective” standard deviation for frequency distribution • Sigma = Max ( Rs, S, Ss, RunSD, 2 ) • Calculate probability statistics for O, P, R, S, and V for each day • Probability statistics are p-values from z-tests • Residual Probability (RP) used as an estimate of overall Confidence Probability (CP) for an observation • Except in the case of potential flatliners, where CP = min(RP,VP) • CP used to weight stations in next iteration
Drifting sensor : MCKENZIE PASS (21E07S) Observations and CP values, Date: 1996-02-08
Drifting sensor : MCKENZIE PASS (21E07S) Climatology vs Observation and Prediction, Date: 1996-02-08
Warm Bias: SALT CREEK FALLS (22F04S) Observations, Date: 2000-07-14
Warm Bias:SALT CREEK FALLS (22F04S) Anomalies and CP values, 7-21 July 2000 14 July
Warm Bias: SALT CREEK FALLS (22F04S) Scatter Plot: Climatology vs Observation,14 July 2000 22F04S Odell Lake COOP
Computing Obstacles • Computing – currently takes about 60 hours to run PRISM PSQC system for SNOTEL sites in the western US • 14-processor cluster • Disk space – we now have > 1 TB, but will probably need more • Funds are insufficient to “do it right”
Issues to Consider • How far can the assumption be taken that spatial consistency equates with validity? • Are continuous and probabilistic QC systems useful for manual observing systems? • Can a high-quality QC system ever be completely automated?