1 / 8

Introduction to Weather Data Cleaning

Learn essential strategies for cleaning weather data, solve common problems like missing values and errors, and ensure data reliability for accurate forecasting. Explore organization, redundancy, flexibility, and human interaction in the data cleaning process. Contact Speedwell Weather for quality historical weather data solutions worldwide.

aprile
Download Presentation

Introduction to Weather Data Cleaning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Weather Data Cleaning Weather data cleaning is fundamental to the provision of high quality weather data. Speedwell Weather offer extensive packs of cleaned historical weather data for sites around the world as well as cleaned weather data feeds. We believe we offer a high quality product. This document shows how we approach the task.

  2. Data Cleaning • Never purchase data from anyone unless they can provide satisfactory answers to these questions: • What is the original source of the data? • What is the observation convention? • What are the attributes (lat, lon, elevation)? • What has been done to the data? Solution (1): Ignore the problem • Erroneous observations will lead to inaccurate pricing Solution (2): Use only “good” stations • Difficult to determine what is good without cleaning it • Greatly limits your ability to trade Solution (3): Clean / fill the data • Fill missing values • Detect and replace erroneous observations • Confirm the consistency of the data Problem: Weather data is not perfect • Missing values • Erroneous observations • Consistency problems • Multiple data sets claiming to be for the same weather station

  3. Data Cleaning • The quality of Meteorological observations varies significantly • Missing / erroneous observation are common place • To safeguard against data problems use cleaned data where available • Fundamentals of a proper data cleaning • Organization • Redundancy • Flexibility • Human interaction • Transparency • Fundamental to satisfying the above is the implementation of software systems infrastructure. ..but data cleaning • cannot and SHOULD not be FULLY automated (see 4) Part of the Speedwell Data cleaning process diagram

  4. Data Cleaning..Organization • Fundamentals of a proper data cleaning • Organization • logical flow • data management • Redundancy • Flexibility • Human interaction • Transparency Data preparation Initial Review In-depth analysis / data filling Manual Review Data delivery Speedwell data quality types

  5. Data Cleaning ..Redundancy Data sources bring in as much as possible and keep what is useful. Typical processing includes: Climate data (daily / hourly), Synoptic data, METAR, ECMWF forecast data, climatology If one source fails there are others • Fundamentals of a proper data cleaning • Organization • Redundancy • data sources • testing • estimates • delivery • Flexibility • Human interaction • Transparency Estimates (filling) Why have one when you can have many? Useful for more in-depth manual analysis • Testing no one test is applicable for all situations. • - comparison against itself • physical consistency • statistical probability • comparison against neighbors • Observations are compared against the median of a basket of proxies and the MAD (median absolute deviation). If the observation is statistically different from the surrounding stations it is sent to the filling process • Data delivery • Multiple FTP deliveries • 24-hour support • logging of all deliveries • Description of data quality and type • A fundamental pre-requisite for effective data cleaning is access to a library of weather data providing access to near by sites allowing plausibility testing for the site being cleaned • Speedwell Weather maintains a very large inventory of weather data for over 50 different weather elements. This is all warehoused by us in a manner that fully respects differing data types (Synoptic/Climate, Cleaned/Raw etc) with a full audit trail. This allows us to document data point changes which may occur when national met offices change data records to reflect their internal QC procedures. Example of weather variables stored for a single site

  6. Data Cleaning..Flexibility • Fundamentals of a proper data cleaning • Organization • Redundancy • Flexibility • consider the situation • appropriateness of tests • Human interaction • Transparency Estimate #1 surrounding station regression using deseasonalized data Estimate #3 Estimates of daily observations by manipulating other data types (Synoptic, METAR, ½ hourly) Estimate #2 Estimates of daily observations from hourly observations (curve fitting) • Estimate #6, #7, #8,… • Flexibility allows you to add any appropriate estimates. The possibilities are unlimited. • satellite derived values • installed stations • reanalysis Estimate #4 Day +1 forecasts can actually be very good… Estimate #5 Climatology – worst case scenario

  7. Data Cleaning..the Human element and Transparency • Fundamentals of a proper data cleaning • Organization • Redundancy • Flexibility • Human interaction • meteorology is complicated • introduction of non-automated information • Transparency • explanation of the process • share what has been cleaned • no-one likes “black boxes”

  8. Contact Us Regarding world-wide weather data and forecast matters please see www.SpeedwellWeather.com or contact: Phil Hayes phil.hayes@SpeedwellWeather.com David Whitehead (U.S) david.whitehead@SpeedwellWeather.com Telephone: UK office: +44 (0) 1582 465 551 US office: +1 (0) 703 535 8801 Address UK: Mardall House, Vaughan Rd, Harpenden, Herts, AL5 4HU Address USA: 101 N Columbus Street, Second Floor, Alexandria VA 22314 USA Regarding software and consultancy services please see www.SpeedwellWeather.com or contact: Stephen Doherty stephen.doherty@SpeedwellWeather.com Dr Michael Moreno michael.moreno@SpeedwellWeather.com David Whitehead (U.S) david.whitehead@SpeedwellWeather.com Telephone: UK office: +44 (0) 1582 465 569 US office: +1 (0) 703 535 8800 Speedwell Weather Derivatives Limited is authorised and regulated by the Financial Services Authority. Registered Offices Mardall House, 9-11 Vaughan Road, Harpenden, Herts AL5 4HU, UK. Company No 3790989.

More Related