340 likes | 463 Views
Some remarks to the derivation of mean hourly values with incomplete data. Pavel Hejda Institute of Geophysics of the ASCR, Prague, Czech Republic. Introduction.
E N D
Some remarks to the derivation of mean hourly values with incomplete data Pavel Hejda Institute of Geophysics of the ASCR, Prague, Czech Republic
Introduction • The papers and discussions during the XIIIth IAGA Observatory Workshop in Golden revealed a diverse set of views regarding the problem of computing mean hourly values (MHVs) from one-minute data when some of the minutes are missing during the hour. As a result, a task force was established to look into the problem and report back to IAGA. Its members are Pavel Hejda (Prague), Don Herzog (Boulder), Hans-Joachim Linthe (Niemegk), Mioara Mandea (Potsdam), Jean-Jaques Schott (Strasbourg) and Leif Svalgaard (Stanford). • The group has been focusing on three main areas of concern: • How different institutes compute hourly means when data are missing. • The various parameters involved in determining the accuracy of MHVs. • How to determine the level of accuracy that can be accepted as a standard. MHVs with incomplete data
IAGA resolutions on MHV production and publication Resolution No. 6 (1967): Making all hourly value data available through the World Data Centers The IAGA recommends that to prevent unnecessary repetition of effort in the preparation of data for determinations of the main field, all mean hourly value data of the geomagnetic field elements be made available in a machine readable form through the World Data Centers. Resolution No. 1 (1969): Continuous publication of magnetic observatory yearbook The IAGA, recognising the availability of many observatory results on microfilm, through the World Data Centers nevertheless recommends the continued publication of magnetic observatory yearbooks containing hourly values and important related data, as described in a recommendation of the Rome IATME 1954 Meeting (IATME Bulletin No. 15, p. 392). Resolution No.26 (1971): World Magnetic Archive The IAGA recommends that as a contribution to the World Magnetic Archive, numerical magnetic observatory data, past and current, be put into machine-readable form whenever practicable for transmittal to a WDDC and that pre-IGY magnetograms and hourly-value tables be microfilmed for transmittal to a WDC. Mean hourly values were the basic data in the pre-digital era. MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC WDC Data Exchange Format- separately for 1-minute and hourly values HOURLY VALUES DIGITAL WDC EXCHANGE FORMAT FOR OBSERVATORY HOURLY MEAN VALUES Magnetic data organized in 1-year files, with the information coded in ASCII. COLUMNS FORMAT DESCRIPTION 1-3 A3 Observatory 3-letter code 4-5 I2 Year. Last 2 digits, 1996 = 96. See also columns 15-16. . . . . . . . . . . 21-116 24I4 24 4-digit hourly mean values for the day. The values are in tenth-minutes for D and I, and in nanoTeslas for the intensity elements. The first hourly mean value represents the mean value between 00:00 UT and 01:00 UT, ..., the 24th value represents the mean between 23:00 UT and 24:00 UT. A missing value is identified by 9999. 117-120 I4 Daily Mean. If any of the hourly mean values for the day are missing 9999 will appear as the daily mean. 121-122 Record end marker. Two chars 'cr'= 13 and 'nl'= 10. . . . . . . . . . . . MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC WDC Data Exchange Format- separately for 1-minute and hourly values HOURLY VALUES DIGITAL WDC EXCHANGE FORMAT FOR OBSERVATORY HOURLY MEAN VALUES Magnetic data organized in 1-year files, with the information coded in ASCII. COLUMNS FORMAT DESCRIPTION 1-3 A3 Observatory 3-letter code 4-5 I2 Year. Last 2 digits, 1996 = 96. See also columns 15-16. . . . . . . . . . . 21-116 24I4 24 4-digit hourly mean values for the day. The values are in tenth-minutes for D and I, and in nanoTeslas for the intensity elements. The first hourly mean value represents the mean value between 00:00 UT and 01:00 UT, ..., the 24th value represents the mean between 23:00 UT and 24:00 UT. A missing value is identified by 9999. 117-120 I4 Daily Mean. If any of the hourly mean values for the day are missing 9999 will appear as the daily mean. 121-122 Record end marker. Two chars 'cr'= 13 and 'nl'= 10. . . . . . . . . . . . There is a rule for missing data MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC WDC Data Exchange Format- separately for 1-minute and hourly values 1-MINUTE VALUES DIGITAL WDC EXCHANGE FORMAT FOR OBSERVATORY 1-MINUTE VALUES Magnetic data organized in 1-month files, with the information coded in ASCII. COLUMNS FORMAT DESCRIPTION 1-6 I6 Observatory's North Polar distance. . . . . . . . . . . 35-394 60I6 60 6-digit 1-minute values for the given element for that data hour. The values are in tenth-minutes for D and I, and in nanoTeslas for the intensity elements. 395-400 I6 Hourly Mean. The average of the preceeding 60 1-minute values. 401-402 Record end marker. Two chars 'cr'= 13 and 'nl'= 10. . . . . . . . . . . . MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC WDC Data Exchange Format- separately for 1-minute and hourly values 1-MINUTE VALUES DIGITAL WDC EXCHANGE FORMAT FOR OBSERVATORY 1-MINUTE VALUES Magnetic data organized in 1-month files, with the information coded in ASCII. COLUMNS FORMAT DESCRIPTION 1-6 I6 Observatory's North Polar distance. . . . . . . . . . . 35-394 60I6 60 6-digit 1-minute values for the given element for that data hour. The values are in tenth-minutes for D and I, and in nanoTeslas for the intensity elements. 395-400 I6 Hourly Mean. The average of the preceeding 60 1-minute values. 401-402 Record end marker. Two chars 'cr'= 13 and 'nl'= 10. . . . . . . . . . . . All values required? MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC IAGA2002 Data Exchange Format – no link between hourly and 1-minute data The 12 mandatory file header records This formatis designated IAGA-2002. Source of Data is the name of the institute responsible for collecting the data. Please spell the entire station name; do not use abbreviations. Capitalize the first letter. The IAGA Code is the official IAGA 3-letter station code. It should be in capital letters and correspond to the IAGA list of magnetic observatories. Variation stations must check observer suggested 3-letter codes against the IAGA list (WDC SEG, Boulder) and confirm through the IAGA Division V WG1 or leave the code blank. Location of the station is reported to the one thousandth degree in geodetic latitude (positive north) from -90 to 90 degrees and in geodetic longitude (positive east) from --180 to 180 or 0 to 360 degrees. Report elevation in meters above mean sea level. Reported refers to the magnetic field elements contained in the data record, in the order recorded in data record. Valid values are DHIF, DHZF, and XYZF. Use E/V instead of D/I for declination/inclination given in intensity units (ONLY if data type is variation). Sensor Orientation is the physical orientation of the observing instruments, i.e. XYZF, HDZ. Digital Sampling is the rate (in seconds) of the data sampling of the magnetic field sensor (instrument) or the digitizing interval for analogue data. Data interval type is the mean or instantaneous time interval of the data. Common values include 1-minute (00:30-01:29), 1-minute (00:00-00:59), 1-hour (00-59), 1-day (00-23) and 1-month (01-31); the last day could also be 30, 29, or 28. There are many possible intervals, including a fraction of a second (instant value), averages by 1-second (501-1500), 1- second (0-1000), 10 second, or 2.5 minute. Define the type of mean and how values are centered in the comment section. Data type is provisional, definitive, or variation MHVs with incomplete data
Publication of mean hourly values INTERMAGNET – always together with 1-minute data World Data Centres – WDC IAGA2002 Data Exchange Format – no link between hourly and 1-minute data The 12 mandatory file header records This formatis designated IAGA-2002. Source of Data is the name of the institute responsible for collecting the data. Please spell the entire station name; do not use abbreviations. Capitalize the first letter. The IAGA Code is the official IAGA 3-letter station code. It should be in capital letters and correspond to the IAGA list of magnetic observatories. Variation stations must check observer suggested 3-letter codes against the IAGA list (WDC SEG, Boulder) and confirm through the IAGA Division V WG1 or leave the code blank. Location of the station is reported to the one thousandth degree in geodetic latitude (positive north) from -90 to 90 degrees and in geodetic longitude (positive east) from --180 to 180 or 0 to 360 degrees. Report elevation in meters above mean sea level. Reported refers to the magnetic field elements contained in the data record, in the order recorded in data record. Valid values are DHIF, DHZF, and XYZF. Use E/V instead of D/I for declination/inclination given in intensity units (ONLY if data type is variation). Sensor Orientation is the physical orientation of the observing instruments, i.e. XYZF, HDZ. Digital Sampling is the rate (in seconds) of the data sampling of the magnetic field sensor (instrument) or the digitizing interval for analogue data. Data interval type is the mean or instantaneous time interval of the data. Common values include 1-minute (00:30-01:29), 1-minute (00:00-00:59), 1-hour (00-59), 1-day (00-23) and 1-month (01-31); the last day could also be 30, 29, or 28. There are many possible intervals, including a fraction of a second (instant value), averages by 1-second (501-1500), 1- second (0-1000), 10 second, or 2.5 minute. Define the type of mean and how values are centered in the comment section. Data type is provisional, definitive, or variation No rule for missing data MHVs with incomplete data
MHVs with incomplete data There is no IAGA resolution on this issue. From the Working Group V-OBS meeting held during the XXIV IUGG General Assembly, Perugia 2007: 'The IAGA requirements for the calculation of hourly mean values have been discussed by Jean-Jacques Schott, using the Kerguelen observatory as an example during the period 1999-2003. It was shown that a correlation exists between data scatter and well-chosen magnetic activity, arguing that no simple rule for hourly mean values does exist. It was proposed that means can only be determined if sufficient data exist (> 90%), and that higher frequency data may be another solution. A uniform standard, preferably INTERMAGNET, should be applied by all observatories.‘ How is the praxis? MHVs with incomplete data
Gaps in observatory data and MHVs calculation How serious is the problem? To answer this question, I have analyzed INTERMAGNET CD ROMs, years 2004, 2005 and 2006 (about one hundred observatories). I was interested not only in data completeness but also in the way how observatory staff treats the calculation of HMV for incomplete data sets. In this analysis I do not take into account the fact that data gaps are sometimes only in one component. It would make the results less transparent. Gap means that there were missing values at least in one component. The bellow tables demonstrates variety of the gap patterns. Next slide shows the most gappy yearly records in 2006. MHVs with incomplete data
Gaps in observatory data and MHVs calculation Summary for INTERMAGNET CD-ROMs 2004 - 2006 MHVs with incomplete data
Gaps in observatory data and MHVs calculation Summary for INTERMAGNET CD-ROMs 2004 - 2006 Number of observatories with complete data sets: 19(2006), 28(2005), 27(2004). MHVs with incomplete data
The effects of missing data on mean hourly values – review Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. Schott, J., and Linthe, H.J., 2007, The hourly mean computation problem revisited, Publication of the Institute of Geophysics, Polish Academy of Sciences, C-99 (398) . Herzog, D.C., 2009, The effects of missing data on mean hourly values, Proceedings of the XIIIth IAGA Workshop, Golden CO MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. 120 days for the 1999 have been analysed. These days are chosen as the quietest and the disturbed days of each month. In order to get an idea about how the field amplitude influences some artificial gaps (from one-minute to half an hour) are created and some statistical comparisons with the whole datasets are made. Next slides show differences between HMVs and mean values with 1 to 30 missing data, for one quiet (green) and one disturbed (red) day. MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. MHVs with incomplete data
Mandea, M., 2003, 60, 59, 58, … How many minutes for a reliable hourly mean?, Proceedings of the Xth IAGA Workshop, Hermanus 112-120. Conclusions This study clearly shows that it is difficult to give a solution available everywhere. However, a general rule can be considered, i.e. reliable hourly means can be computed from one-minute values if less than 10% of data are missing. Nevertheless, individual users or team research scientists must themselves inspect the data before using them in different studies. MHVs with incomplete data
Schott, J., and Linthe, H.J., 2007, The hourly mean computation problem revisited, Publication of the Institute of Geophysics, Polish Academy of Sciences, C-99 (398) The issue is revisited from a statistical point of view. In a first step, relevant statistics for gap and data distribution within hourly intervals are built up using actual time series. Then, the statistics are applied to full one minute field values in order to evaluate the dispersion of the hourly means due to the gap and data statistical distribution. Data from PAF observatory (Port-Aux-Français, Kerguelen Island) in the range 1999 to 2003 is used for illustration. One quiet and one disturbed day were analyzed. MHVs with incomplete data
Schott, J., and Linthe, H.J., 2007, The hourly mean computation problem revisited, Publication of the Inst. of Geophysics, Polish Acad. Sci., C-99 (398) MHVs with incomplete data
Schott, J., and Linthe, H.J., 2007, The hourly mean computation problem revisited, Publication of the Inst. of Geophysics, Polish Acad. Sci., C-99 (398) MHVs with incomplete data
Schott, J., and Linthe, H.J., 2007, The hourly mean computation problem revisited, Publication of the Institute of Geophysics, Polish Academy of Sciences, C-99 (398) Conclusions The study confirms and amplifies conclusions already drawn by Mandea (2002), namely that the error involved depends, beside the length of gap, on the shape of the field variations and the level of the magnetic disturbance. These features, in turn, depend on the position of the observatory. The influence of the magnetic disturbance upon the confidence interval might be quantified with the help of an appropriate magnetic activity index. However, overall, the problem of fixing a limit to the tolerable length of gap in an hourly set of data is probably ill-posed. One way of circumventing the difficulty would be to rely on statistical methods dealing properly with missing data. MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO • Used 3 USGS stations from the INTERMAGNET CD-ROMs. • Used the X-Component instead of H or D. • Selection Criteria: • 3 latitudes: High (College) Mid (Boulder) Low (San Juan) • 3 magnetic activity levels (based on K-Index) • Active (K = 8) Moderate (K = 5) Quiet (K = 0) • Constructed non-missing data sets of 24-hour days using 3-hour intervals with same K-Index • Generated sets of random numbers between 1 and 60 for • 5-minute, 10-minute, … up to 40-minutes of deletion MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO Deletion sets were produced using a random number generator MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO (vs. 350) MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO MHVs with incomplete data
Herzog, D.C., 2009, The effects of missing data on mean hourly values Proceedings of the XIIIth IAGA Workshop, Golden CO Conclusion It seems clear that a one-rule-fits-all approach to the question of when an MHV should or should not be computed will not be adequate either. There are cases where a relatively small amount of data during an hour will produce a reasonable MHV. In the majority of cases considered here, for example, the errors proved to be less than a few nT. This is not to say that an MHV should always be computed. Further study is needed to more clearly identify the profile of MHV errors that result from missing data. And the geomagnetism community needs to decide what size errors for MHVs in these cases will be acceptable, and under what conditions. MHVs with incomplete data
The usage of mean hourly values IAGA resolutions Resolution No. 9 (1967): Automatically constructing ionospheric current charts The IAGA recommends an investigation of the feasibility of automatically constructing ionospheric current charts for any U.T. epoch, using mean hourly values from a well distributed group of magnetic observatories. After feasibility is proven the IAGA recommends that arrangements be made with a suitable agency to construct and publish such charts for 3 or 4 epochs each Greenwich day, as a routine procedure. Resolution No.10 (1979): Supply of observatory magnetic data for MAGSAT project IAGA, recognising the value of the MAGSAT geomagnetic measurements, urges the continuing support of the geomagnetic observatories and measurements at repeat stations to maintain the high precision of world magnetic charts and recommends that, until the end of 1980, observatories send hourly values (preliminary values, if need be) to the World Data Centers not later than two months after the end of the recording period. MHVs with incomplete data
The usage of mean hourly values ISI Web of Knowledge The search string was “hourly mean* geomagnetic”. The Web returned 135 hits Centennial studies of geomagnetic activity - A(h) or IHV indices - aimed at obtaining the longest series of possibly homogeneous indices that can characterize the variability of the geomagnetic field. Hourly means were partly derived from 1-minute digital data in order to be comparable with the measurements that had been carried out many decades ago. (5 hits) Observatory hourly means used in study of geomagnetic activity, daily variation, magnetosphere, auroral electrojet and magnetotelurics. The most papers used data from sixties and seventies, some papers used recent data from non-INTERMAGNET observatories. At least part of the data was from analogue magnetometers. (40 hits) IGRF, jerks: (2 hits) Satellite data underpinned by observatory hourly means: (3 hits) Reports on observatory systems: (2 hits) MHVs with incomplete data
The usage of mean hourly values ISI Web of Knowledge The search string was “hourly mean* geomagnetic”. The Web returned 135 hits Centennial studies of geomagnetic activity - A(h) or IHV indices - aimed at obtaining the longest series of possibly homogeneous indices that can characterize the variability of the geomagnetic field. Hourly means were partly derived from 1-minute digital data in order to be comparable with the measurements that had been carried out many decades ago. (5 hits) Observatory hourly means used in study of geomagnetic activity, daily variation, magnetosphere, auroral electrojet and magnetotelurics. The most papers used data from sixties and seventies, some papers used recent data from non-INTERMAGNET observatories. At least part of the data was from analogue magnetometers. (40 hits) IGRF, jerks: (2 hits) Satellite data underpinned by observatory hourly means: (3 hits) Reports on observatory systems: (2 hits) MHVs with incomplete data
Very preliminary concluding remarks A one-rule-fits-all approach to the question of when an MHV should or should not be computed will not be adequate A possibility of assigning some type of quality flag to the MHV has been also discussed The accuracy of MHVs from gappy digital data should be at least as good as those from non-gappy analogue data Use original 1-minute data whenever possible Your contribution is welcome. MHVs with incomplete data