390 likes | 509 Views
INTERPOLATED CLIMATE LAYERS FOR USE IN SPECIES MODELING : Interpolation of maximum temperature in Venezuela. 03-24-2013 Benoit Parmentier. Predictions for year 2000 will be rerun using 1980-2010 and 2000-2010 averages for comparison.
E N D
INTERPOLATED CLIMATE LAYERS FOR USE IN SPECIES MODELING: Interpolation of maximum temperature in Venezuela. 03-24-2013 Benoit Parmentier • Predictions for year 2000 will be rerun using 1980-2010 and 2000-2010 averages for comparison. • Check on number of stations daily, monthly +report in outputs… • This is work in progress as of 03-24-2013.
Temperature predictions: Coding Status • There are currently 9 scripts: • 8 scripts are made up of R functions • 1 python script to be called from R. • Gam fusion implemented. • Tmin LST climatology average calculated (checking results). Data preparation Master script Raster prediction Assessment Scripts are reorganized, rewritten into functions to improve design and speed in preparation for the use of super computer. #master_script_temp_03192013.R #CALLED FROM MASTER SCRIPT: climatology_03192013.py source(file.path(script_path,"covariates_production_temperatures_03212013.R")) source(file.path(script_path,"Database_stations_covariates_processing_function_03132013.R")) source(file.path(script_path,"GAM_fusion_analysis_raster_prediction_multisampling_03182013.R")) source(file.path(script_path,"results_interpolation_date_output_analyses_03182013.R")) #CALLED FROM GAM FUSION ANALYSIS RASTER PREDICTION source(file.path(script_path,"sampling_script_functions_03122013.R")) source(file.path(script_path,"GAM_fusion_function_multisampling_03142013.R")) #Include GAM_CAI source(file.path(script_path,"GAM_fusion_function_multisampling_validation_metrics_03182013.R"))
TEMPERATURES PREDICTIONS: STATUS 1) Tmax,VE,monthly average 2000-2010, year 2010 (on redmine) 2) Tmax,VE,monthlyaverage 1980-2010, year 2010 3) Tmax,VE,monthlyaverage 2000, year 2000 4) Tmax,VE,monthlyaverage 1980-2000, year 2000 In green no predictions for the 4 runs. This is due to the lack of data for fitting at the monthly stage. Models predicted: Mod1: y_var~ s(elev_1) Mod2: y_var~ s(LST) Mod3: y_var~ s(elev_1,LST) Mod4: y_var~ s(lat) + s(lon)+ s(elev_1) Mod5:y_var ~ s(lat,lon,elev_1) Mod6: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) , Mod7: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(LC2) Mod8: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(LC6) Mod9: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(DISTOC) Mod_kr: kriging
BOXPLOTS FOR PREDICTION OVER A FULL YEAR 2010 with average 2000-2010 2010 with average 1980-2010 2000 with average 2000 2000 with average 1980-2000
TEMPERATURES PREDICTIONS: ACCURACY METRICS-MEAN AND MEDIAN OVER FULL YEAR 2010 with average 1980-2010 2010 with average 2000-2010 2000 with average 1980-2000 2000 with average 2000 Note that only 93 days were predicted for model 3 for the year 2000 when using a 10 year monthly station average!!
Venezuela region • 8,640,000 pixels lie in the 6 tiles (compared to 399320 pixels in the Oregon region). • 3,569,481 pixels are valid compared to 357,363 in Oregon (i.e. 10 times more) • 41.31% of the pixels are valid for prediction compared to 90% in the Oregon case study. • There are 357 GHCN stations in the region (within 6 tiles).
Maximum temperatures –RUN 2 • Tmax,VE,monthly average 1980-2010, year 2010 • Results stored in: raster_prediction_obj__365d_GAM_fus5_all_lstd_03132013.Rdata • Models predicted Mod1: y_var~ s(elev_1) Mod2: y_var~ s(LST) Mod3: y_var~ s(elev_1,LST) Mod4: y_var~ s(lat) + s(lon)+ s(elev_1) Mod5:y_var ~ s(lat,lon,elev_1) Mod6: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) , Mod7: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(LC2) Mod8: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(LC6) Mod9: y_var~ s(lat,lon) + s(elev_1) + s(N_w,E_w) + s(LST) + s(DISTOC) Mod_kr: kriging
Daily tmax for 20100101 I used the 30 years window that is usual in the climatology field. 69 stations compared to 52 before… for January… By using extending the time window to 1980, we obtain 17 more stations using the GHCN station. FAOCLIM may be added but we should keep in mind the climatology is 1961-1990 and does not overlap the current prediction period.
Daily tmax for 20100101 Daily tmax predictions with rmse corresponding rmse metrics.
Daily tmax for 20100101 Daily tmax predictions with rmse corresponding rmse metrics.
Daily tmax for 20100101 Add number of stations!!
Daily tmax for 20100101 Will need to deal with Islands in the workflow. Stations available in January…averages for 1980-2010
3) Maximum temperatures Tmax,VE,monthly average 2000-2010, year 2000 Results stored in: raster_prediction_obj__365d_GAM_fus5_all_lstd_03142013.RData
Daily tmax for 20000101 Note that there are no predictions for model 3 because of lack of data at the monthly fitting stage!! due to the use of only 2000 stations. I’m rerunning the predictions today.
Daily tmax for 20000101 Station available for fitting… this is only for year 2000.
Daily tmax for 20000101 This is an overall average boxplot for year 2000.
4) Maximum temperatures Tmax,VE,monthly average 1980-2000, year 2000 Information stored in: raster_prediction_obj__365d_GAM_fus5_all_lstd_03182013.RData
Daily tmax for 20000101 Add map of Residuals to spot locations of outliers!!
Daily tmax for 20000101 This is a daily –tmax prediction using monthly average over 1980-2000 I am currently rerunning with monthly averages 1980-2010.
Daily tmax for 20000101 Add number of stations for training and testing!!
Daily tmax for 20000101 This is for January using monthly averages for the time window:1980-2000.
Daily tmax for 20000101 Overall average for metrics Improvement expected when using monthly tmax averages for 1980-2010 time period.
SOME THOUGHTS AND PRELEMINARY CONCULSIONS FROM THE CURRENT PREDICTIONS… At this stage after running four 1 year prediction: We are in the same RMSE range as in Oregon (2.30C): 2.07 to 2.7 for mod_kr 2) Using a longer time window to calculate monthly average per stations increases the number but does not improve the RMSE in all cases 3) Due to the paucity of data and the demand of the GAM method, it is hard to fit models with more than 3 variables or with interaction. 4) Best model so far: GAM+simple kriging and GAM with LST and elevation with nesting. Note that GAM with LST and elevation may suffer from lack of data (see prediction in 2000). Predictions for year 2000 will be rerun using 1980-2010 and 2000-2010 averages for comparison.