650 likes | 857 Views
Introduction to BUFR. TRAINING ON METEOROLOGICAL TELECOMMUNICATIONS WMO RTC-Turkey facilities, Alanya, Turkey 22-30 September 2010. What is BUFR?. B inary U niversal F orm for the R epresentation of Meteorological Data Used for data that are not on a regular grid, such as
E N D
Introduction to BUFR TRAINING ON METEOROLOGICAL TELECOMMUNICATIONS WMO RTC-Turkey facilities, Alanya, Turkey 22-30 September 2010
What is BUFR? Binary Universal Form for the Representation of Meteorological Data Used for data that are not on a regular grid, such as observations Conceptually equivalent to CREX, but format is binary rather than alphanumeric
What does a BUFR message look like? 01000010010101010100011001010010000000000000000000110100000000110000000000000000 00010010000000000000000000111000000000000000000000000000000000000000100100000001 00000001000001000001110100001100000000000000000000000000000000000000111000000000 00000000000000011000000000000001000000010000000100000010000011000000010000000000 00000000000000000000100000000000100100001111010111011100010000000011011100110111 0011011100110111 (In other words, just an apparently random string of 0’s and 1’s!)
Sections of a BUFR message • 0 Indicator section • 1 Identification section • 2 Optional local use section • 3 Data description section • 4 Data section • 5 End of message
Section 0 – Indicator section This section contains: • The character string “BUFR” indicating the start of the message • The total length of the message • The BUFR edition number
Section 0 - Details • Length always 8 • Octets 1-4 “BUFR” (in CCITT IA5) • Octets 5-7 Total length of message (including Section 0) • Octet 8 Edition number (currently 4, but 3 is still used)
Now, let’s go back and look at that BUFR message again… ‘B’ ‘U’ ‘F’ ‘R’ end of section 0 + octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | binary string 01000010010101010100011001010010000000000000000000110100000000110000000000000000 00010010000000000000000000111000000000000000000000000000000000000000100100000001 00000001000001000001110100001100000000000000000000000000000000000000111000000000 00000000000000011000000000000001000000010000000100000010000011000000010000000000 00000000000000000000100000000000100100001111010111011100010000000011011100110111 0011011100110111
Section 1 – Identification section This section contains: • The table versions referred to by this message • An overall description of the message contents, including: • The originating centre and sub-centre • The data category and sub-category • A representative date and time • Whether or not the optional section is included
Section 1 – DetailsBUFR edition 3 • Length at least 18 • Octets 1-3 Length of section • Octet 4 Master table (0 for WMO, 10 for IOC, etc.) • Octet 5-6 Originating sub-centre and centre • Octet 7 Update sequence number • Octet 8 Flag (Optional section?) • Octets 9-10 Data category and local data sub-category • Octets 11-12 Master and local table version numbers • Octets 13-17 Date and time typical of message contents • Octets 18-?? Reserved for local use
Section 1 – DetailsBUFR edition 4 • Length at least 22 • Octets 1-3 Length of section • Octet 4 Master table (0 for WMO, 10 for IOC, etc.) • Octet 5-8 Originating centreand sub-centre • Octet 9 Update sequence number • Octet 10 Flag (Optional section?) • Octets 11-12 International data category and sub-category • Octets 13 Local data sub-category • Octets 14-15 Master and local table version numbers • Octets 16-22 Date and time typical of message contents • Octets 23-?? Reserved for local use
Section 2 – Optional section This section is defined by the ADP (Automated Data Processing) centre generating or using the message • It typically contains additional information of use to the ADP centre, such as • Database keys to aid searching for specific data without decoding the message • Anything else a processing centre may find useful
Section 3 – Data description section This section contains: • A count of the number of data subsets (typically individual observations) • Flags indicating whether or not the data are compressed or uncompressed and observed or forecast • A list of descriptors that describe data elements contained in each data subset
Section 3 - Details • Length at least 10 • Octets 1-3 Length of section • Octet 4 Set to zero • Octets 5-6 Number of subsets • Octet 7 Flag (Obs?, Compressed?) • Octets 8-?? List of descriptors • Each descriptor 2 bits F, 6 bits X, 8 bits Y
Now, let’s go back and look at that BUFR message again… ‘B’ ‘U’ ‘F’ ‘R’ end of section 0 + octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | 2 | binary string 01000010010101010100011001010010000000000000000000110100000000110000000000000000 octet number 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | binary string 00010010000000000000000000111000000000000000000000000000000000000000100100000001 end of section 1 + octet number 13 | 14 | 15 | 16 | 17 | 18 | 1 | 2 | 3 | 4 | binary string 00000001000001000001110100001100000000000000000000000000000000000000111000000000 end of section 3 + octet number 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | binary string 00000000000000011000000000000001000000010000000100000010000011000000010000000000 00000000000000000000100000000000100100001111010111011100010000000011011100110111 0011011100110111
Section 4 – Data section This section contains: • The actual data as specified by Section 3 • One of two formats is used • Compressed • Uncompressed • Such data are still packed, but not as efficiently as compressed data usually are
Section 4 - Details • Octets 1-3 Length of section • Octet 4 Set to zero • Octets 5-?? Binary data as specified by Section 3
Now, let’s go back and look at that BUFR message again… ‘B’ ‘U’ ‘F’ ‘R’ end of section 0 + octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | 2 | binary string 01000010010101010100011001010010000000000000000000110100000000110000000000000000 octet number 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | binary string 00010010000000000000000000111000000000000000000000000000000000000000100100000001 end of section 1 + octet number 13 | 14 | 15 | 16 | 17 | 18 | 1 | 2 | 3 | 4 | binary string 00000001000001000001110100001100000000000000000000000000000000000000111000000000 end of section 3 + octet number 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | binary string 00000000000000011000000000000001000000010000000100000010000011000000010000000000 end of section 4 + octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | binary string 00000000000000000000100000000000100100001111010111011100010000000011011100110111 0011011100110111
Section 5 – End section This section contains: • The character string “7777” indicating the end of the message • Checking for this indicator can be useful to detect some types of data corruption (especially missing bytes in the rest of the message) since the total length of the message is known from Section 0
Now, let’s go back and look at that BUFR message one last time! ‘B’ ‘U’ ‘F’ ‘R’ end of section 0 + octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | 2 | binary string 01000010010101010100011001010010000000000000000000110100000000110000000000000000 octet number 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | binary string 00010010000000000000000000111000000000000000000000000000000000000000100100000001 end of section 1 + octet number 13 | 14 | 15 | 16 | 17 | 18 | 1 | 2 | 3 | 4 | binary string 00000001000001000001110100001100000000000000000000000000000000000000111000000000 end of section 3 + octet number 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | binary string 00000000000000011000000000000001000000010000000100000010000011000000010000000000 end of section 4 + ‘7’ ‘7’ octet number 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | 2 | binary string 00000000000000000000100000000000100100001111010111011100010000000011011100110111 ‘7’ ‘7’ + end of section 5 octet number 3 | 4 | binary string 0011011100110111
BUFR Descriptors • Section 3 contains a list of BUFR descriptors • These describe the data elements that are contained in Section 4 • Most descriptors are references to BUFR Tables B, C and D • Using the list of descriptors in Section 3, together with the tables, it is possible to unpack the data in Section 4
Types of BUFR descriptors • Element descriptors (Table B) • Replication descriptors • Operator descriptors (Table C) • Sequence descriptors (Table D) • Specified by 3 numbers in 16 bits (2 octets) • F: 2 bits 0-3 • X: 6 bits 0-63 • Y: 8 bits 0-255
Element descriptors • Defined by entries in Table B • F is 0 • Each element descriptor describes an encoded value, such as: • The value of a meteorological parameter (e.g. mean sea level pressure, temperature, wind speed) • Instrument details • Location or date and time information • Quality control information
Replication Descriptors • Describe the repetition of one or more element, operator, sequence or other replication descriptors • Used for repetitive data such as the individual levels in vertical soundings or temperature profiles • Can be: • Fixed - the number of repetitions is pre-determined and the same for all data subsets • Variable - the number of repetitions can differ from one subset to the next (i.e. delayed replication)
Replication descriptors - continued • Replication descriptors are defined by three numbers F X Y • F is 1 • X is an integer between 1 and 63 • Defines the number of descriptors to be repeated • Y is an integer between 0 and 255 • Defines how many times the X descriptors are to be repeated • A count of zero indicates delayed replication, where the repeat count is stored in the data section and can change from one data subset to another.
Operator descriptors • Defined by entries in Table C • F is 2 • Describe changes to be made to other descriptors • Operators exist for applications such as: • Changing scale or reference value or data width • Adding quality control or other associated fields • Describing the descriptors to which quality control information applies • Substituting a better value for an element, while retaining the original value
Sequence descriptors • Defined by entries in Table D • F is 3 • Shorthand notations for pre-defined lists of other element, replication, sequence and operator descriptors • Not really necessary, but useful in reducing the overhead involved in transmitting data in BUFR: • Replace a commonly-used sequence of descriptors with a single descriptor, and thereby reduce the overall length of Section 3
BUFR tables There are many different tables involved in BUFR: • Table A • Data Categories, used in Section 1 • Table B • Element descriptors, used in Section 3 • Table C • Operator descriptors, used in Section 3 • Table D • Sequence descriptors, used in Section 3 • Code and Flag tables • Numerical values to be encoded where the element values are qualitative, used in Section 4
Table A • Defines the general category of the data contained in the BUFR message • Encoded in Section 1 • Examples of typical entries: Code figureMeaning 0 Surface data – land 1 Surface data – sea 2 Vertical soundings (non-satellite) 3 Vertical soundings (satellite) … … 6 Radar data … … 10 Radiological data 12 Surface data (satellite) … … 31 Oceanographic data
Table B • Describes the individual values that are encoded • Element descriptors are grouped according to classes (i.e. X value) Class NumberClass NameClass NumberClass Name 01 Identification 12 Temperature 02 Instrumentation 13 Hydrological … … 14 Radiation and radiance 04 Location (time) … … 05 Location (horizontal-1) 19 Synoptic features 06 Location (horizontal-2) 20 Observed phenomena 07 Location (vertical) 21 Radar data … … … … 11 Wind and turbulence 33 Quality information
Table B • Columns are: • Table reference • Element name • Unit • Scale • Reference value • Data width (in bits) • Scale, reference value, and bit width are chosen so that the desired range of possible data values can be stored in BUFR as non-negative integers • Preserves the machine-independence of BUFR
Table B reference • Expressed as 3 small numbers F X Y • Used to refer to this descriptor • F is always 0 for an element descriptor • X is in the range 0 to 63 and refers to a broad class of elements • Classes 48 to 63 are reserved for local use • Y is in the range 0 to 255 and refers to the individual descriptor in the class • Within all classes, descriptors 192 to 255 are reserved for local use
Table B element name • Natural language description of the meaning of the value • English (and French, Russian, Spanish) • For example: • Brightness temperature • Total precipitation past 24 hours • Wind speed
Table B unit • The units used for the value • Normally SI units are used • “CCITT IA5” (the international version of ASCII) is used for character data such as identifiers • “Code Table” is used for qualitative data where only one of a set of possible values can be applicable in a given data subset • “Flag Table” is used for qualitative data where more than one of a set of possible values may be applicable in a given data subset • For qualitative data, the coded values are references to the Code and Flag tables
Table B scale • Scale • Power of 10 by which to multiply the data value before packing • Determines the precision with which the data are encoded • A scale of 2 means 2 decimal places of precision (eg. 273.15) • A scale of –1 means that the data values are rounded to the nearest multiple of 10
Table B reference value • Used to subtract an offset where negative data have to be encoded • Table B contains the value (multiplied by the scale) of the offset to be subtracted • For example, scale=2, reference value -9000 means that -90.00 is to be subtracted before scaling (i.e. -9000 after scaling), allowing values as negative as -90.00 to be represented
Table B data width • The number of bits to be used to encode the value • If all bits are set to ones when encoding (i.e. a value of (2n-1) when n is the data width), then this denotes a “missing” value. • If the scale is s, the reference value is r, and the data width is n, then the representable range of values is: • Minimum (10-s r) • Maximum (10-s (2n-2+r)) and (10-s (2n-1+r)) denotes the “missing” value.
Table B examples - continued • 0 11 002 - Wind speed • Scale=1, Reference value=0, Data width=12 • Precision is one decimal place (i.e. 0.1 m s-1) • Minimum representable value is: (10-1×0) = 0.0 m s-1 • Maximum representable value is: (10-1×(212-2+0)) = 409.4 m s-1 • “Missing” value is: (10-1×(212-1+0)) = 409.5 m s-1
Table B examples - continued • 0 13 023 - Total precipitation past 24 hours • Scale=1, Reference value=-1, Data width=14 • Precision is one decimal place (i.e. 0.1 kg m-2) • For this descriptor, -0.1 kg m-2 is a special value for trace, according to a specific note in Table B • Minimum representable value is: (10-1×-1) = -0.1 kg m-2 (= trace) • Maximum representable value is: (10-1×(214-2-1)) = 1638.1 kg m-2 • “Missing” value is: (10-1×(214-1-1)) = 1638.2 kg m-2
Table B examples - continued • 0 20 003 - Present weather • Scale=0, Reference value=0, Data width=9 • Coded values are integers since Scale=0 • Minimum representable value is: (100×0) = 0 • Maximum representable value is: (100×(29-2+0)) = 510 • “Missing” value is: (100×(29-1+0)) = 511 • One must refer to Code Table 0 20 003 in order to discover the actual meaning of each coded value
0 20 003 – Present WeatherCode Table (excerpted) Code figureMeaning 0 Cloud development not observed or not observable 1 Clouds generally dissolving or becoming less developed … 10 Mist 11 Patches of shallow fog or ice fog 13 Lightning visible, but no thunder heard … 171 Snow, slight (reported from an AWS) 172 Snow, moderate (reported from an AWS) 173 Snow, heavy (reported from an AWS) … 511 Missing
Code tables vs. Flag tables(choice of one vs. choice of more than one) 0-01-0030-08-001 WMO region numberVertical sounding significance Code figureMeaningBit numberMeaning 0 Antarctica 1 Surface 1 Region I 2 Standard level 2 Region II 3 Tropopause level 3 Region III 4 Maximum wind level 4 Region IV 5 Significant 5 Region V temperature level 6 Region VI 6 Significant wind level 7 Missing value All 7 Missing value For a Code table, the value stored in Section 4 is the code figure corresponding to the applicable meaning. For a Flag table of N bits, the value stored in Section 4 is (2(N-bit#) + 2(N-bit#) + …) for the bit(s) corresponding to each applicable meaning. Bit No. 1 is the most significant bit. The least significant bit is set to 0 in order to distinguish “all meanings applicable” from “missing”.
Some other important regulations pertaining to Table B Elements in classes 01 – 09 are “coordinate” descriptors which remain in effect until redefined or until the end of the subset Exception: when two identical descriptors from classes 04 – 07 are listed consecutively, they define the boundaries of a range Similar descriptors exist in “coordinate” vs. “non-coordinate” classes Example: 0 07 004 and 0 10 004 are both “Pressure” with identical scale, reference value and bit width; however, the former is a “coordinate” for use when pressure is the main defining coordinate measured in the vertical direction (e.g. in radiosondes) vs. the latter which is a “non-coordinate” for use when pressure is a derived value (e.g. an aircraft calculating pressure as a function of an observed or measured height) Class 08 contains significance qualifiers which can be used to report qualitative information and which can be explicitly “cancelled” Example: 0 08 011 with value 12 can indicate that we are talking about a “cloud”
Table C • Describes the various operators • Columns are: • Table reference F X • F is 2 • X is an integer between 0 and 63 • There is no sub-range of X values reserved for local use • Operand • A number between 0 and 255 • Operator name • A short name describing the operation • Operator definition • A detailed description of the operation and its effects
Table C • This is just an excerpt – there are many other (even more complicated!) operators in Table C. • There are also many important notes to Table C describing, e.g. how to cancel an operator.
Table C example • Table reference F=2 X=01 • Operand, in this case represented as Y • Operator name “Change data width” • Operator description: “Add Y-128 bits to the data width given for each data element in Table B, other than CCITT IA5 (character) data, code or flag tables” • According to a note under Table C, this operator is cancelled (i.e. effect is turned off) by repeating the operator with Y=0, or at the end of each data subset
Table C example - continued • The “Change data width” operator causes the data width to be changed for subsequent elements, in effect giving them a larger (or smaller) range than is otherwise prescribed within Table B. Thus, it can be used to: • encode values that exceed the usual representable range • for a descriptor, instead of having to introduce a new • Table B descriptor (note: in such cases, Y > 128) • reduce the size of the data (and thus the overall encoded • message as well!) if the required data range can be • encoded using a smaller data width than provided within • Table B (note: in such cases, Y < 128)
Table C example -continued As an example, one of the standard descriptors for the height coordinate of an observation is 0 07 007 with unit=m, scale=0, reference=-1000, data width=17, giving a representable range of –1000 m to 130070 m. If one needed to encode a value larger than this, then the 2 01 operator could be used to increase the data width. For example, use of the operator 2 01 130 before the 0 07 007 descriptor would increase its data width from 17to 19 bits and therefore allow values up to 523286 m.