330 likes | 638 Views
NPS. Robert DeFeo, chief horticulturalist for the National Park Service, is responsible for predicting when the cherry blossoms bloom. He has been making predictions and recording observations since 1992
E N D
NPS • Robert DeFeo, chief horticulturalist for the National Park Service, is responsible for predicting when the cherry blossoms bloom. • He has been making predictions and recording observations since 1992 • Defined and recorded 6 phases of development: Green Color in Buds, Florets Visible, Extension of Florets, Peduncle Elongation, Puffy White, Peak Blooms
NPS • DeFeo attempts to predict when the trees will reach peak bloom so that this will occur during the Cherry Blossom Festival • Default prediction is April 4th • He announces his prediction 2 weeks before the bloom
NPS • What factors does DeFeo consider? • Bloom of other plants • High and low temperature • Photoperiod
Collecting the Data • DeFeo provided bloom dates dating back to 1921 and dates of 6 stages back to 1992
Collecting the Data • Weather data was obtained from the Weather Underground site • Data was pulled using screen scraping • Weather data started in 1948 • Incomplete data for 1948, 1994, 1995, 1996, 2000, and 2001.
Visualizing the Data • Year and average temperature in March
Visualizing the Data • Year and Bloom Date
Visualizing the Data • Bloom and temperature
What can we learn from this? • Data “looks” linear • Strong correlation between temperature and when the peak bloom occurs
Heuristics • GDD • Given by the equation: • (Thigh + Tlow)/2 - Tbaseline • Calculated accumulated GDD using values from 0°F to 60°F at 10° increments • Used linear regression • Found 0°F produced best adjusted R2 value.
Calculated the accumulated GDD from Jan 1 to bloom date • Created program that reserved records for 26 of 53 years • Performed linear regression • Calculated RMSE on cross-validation set
Heuristics • Regression: -0.0242095469666847x+88.8862936319553 • Total error: 7.2 • March 1st error: 5.7 • March 15th error: 5.8 • Bloom Date error: 8.5
Heuristics • Used linear regression on same data but excluded January and February from regression model • Regression:-0.0216093918184858x+84.1086368228915 • Total error: 5.9 • March 1st error: 5.9 • March 15th error: 6.3 • Bloom Date error: 5.6
Heuristics • Used linear regression on same data but excluded January from regression model • Regression: -0.0217918555433408x+85.0674318813992 • Total error: 6.2 • March 1st error: 6.6 • March 15th error: 6.2 • Bloom Date error: 5.9
Heuristics • Recalculate GDD excluding without January and February • Regression: -0.0189188233985223x+33.4894170814518 • Total error: 5.7 • March 1st error: 6.9 • March 15th error: 6.4 • Bloom Date error: 6.4
Calculate GDD beginning February 1st • Regression: -0.022215061163981x+61.0588653664869 • Total error: 5.8 • March 1st error: 6.1 • March 15th error: 5.8 • Peak Bloom error: 5.2
Heuristics • Calculate GDD beginning February 1st. Create regression model starting March 1st. • Regression: -0.021961372553719x+59.6303409305679 • Total error: 5.5 • March 1st error: 5.3 • March 15th error: 5.0 • Peak Bloom error: 4.8
Heuristics • Use GDD with ANN • Use accumulated GDD since January 1st as input • Preprocessed data to create a single lag-file for all the years • Processed data using CortexPro Neural Networks tool, v.5.0 • Days till bloom is output
Heuristics • Use average temperature as indicator of bloom date • Use linear regression on average temperature in March • Regression: -1.1550140341924x+147.033044143914 • RMSE: 4.8 • Use linear regression on average temperature of first 15 days in March • Regression:-0.505593057443286x+115.921494363343 • RMSE: 6.0
Heuristics • Use average bloom date (April 4th) as prediction. • RMSE: 6.5
Conclusions • Utility of model varies depending upon data available • While DeFeo’s model is accurate, powerful models were created that do not rely on direct observation of data • Models were “good enough” to fall into timespan of festival
Future Work • The models created can be refined as the knowledge base grows • Include a standard measure of error for all models • Include photoperiod as a factor • Incorporate electronic GDD recordings • Include image data with pattern recognition