1 / 107

3 rd May 2006

A Proposal for a Reference Implementation of the WMO Core Metadata Profile based on ISO-19115/19139. by Jeremy Tandy, UK Met Office Jürgen Seib, Deutscher Wetterdienst Michael Burek, National Center for Atmospheric Research. 3 rd May 2006. A revised WMO Core Metadata Profile.

cagee
Download Presentation

3 rd May 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Proposal for a Reference Implementation of the WMO Core Metadata Profile based on ISO-19115/19139 by Jeremy Tandy, UK Met Office Jürgen Seib, Deutscher Wetterdienst Michael Burek, National Center for Atmospheric Research 3rd May 2006

  2. A revised WMO Core Metadata Profile • WMO Core Metadata Profile … ISO 19115 compliant? (also note that the standard itself has undergone some revision: ISO 19115:2003/Cor. 1:2006) • Motivation • version 0.2 is NOT compliant to ISO 19115 • XML schemas of version 0.2 are ‘bespoke’ • UML model for WMO extensions does NOT exist

  3. ISO 19115 compliance • WMO Core Profile v0.2 breaks the extension rules defined in ISO19115 Annex C • WMO encoding does not match ISO 19115 • Changes to ISO 19115 (extensions) not in WMO-governed namespace • Examples: • Cardinality restricted • Optional attributes removed • … and much worse … (DQ_DataQuality) • Further issues were noted in the WMO IPET-MI report prepared by Clemens Portele

  4. A revised WMO Core Metadata Profile • Characteristics of WMO metadata documents • simple • human readable • (optionally) self-contained; i.e. the metadata record *should* be able to be expressed as a single document • multilingual • XML-validity should imply semantic correctness • ISO 19115 compliant ‘community profile’

  5. ISO 19115 - Extensions • How do we create an ISO 19115 compliant metadata profile? • Guidance: • ISO 19115 Annex C: Metadata extensions and profiles • ISO 19115 Annex F: Metadata extension methodology • In order to ensure interoperability beyond the community where the extensions are implemented: • You must document the extension via the ‘extension metadata’ described in ISO 19115, and • Add an isoType attribute to the class indicating which ISO 19115 class was sub-classed

  6. ISO 19115 Clause C.2: Types of extensions • adding a new metadata section • creating a new metadata code list to replace the domain of an existing metadata element that has “free text” listed as its domain value • creating new metadata code list elements (expanding a code list) • adding a new metadata element • adding a new metadata entity • imposing a more stringent obligation on an existing metadata element • imposing a more restrictive domain on an existing metadata element

  7. ISO 19115 Clause C.4: Rules for creating an extension • the name, definition or data type of an existing element can not be changed • a new element may include extended and existing metadata elements as components • metadata elements can be more stringent • domains can be more restrictive • the use of domain values can be restricted • code lists of type «CodeList» can be expanded • an extension shall not permit anything not allowed by the standard

  8. Rules for creating a profile • check registered profiles • adhere to the rules for defining an extension • A profile shall include: • the core metadata • all mandatory metadata elements • all conditional metadata elements, if the dataset meets the condition • use UML diagrams to describe a profile • use your own namespace • publish the profile

  9. WMO namespace • namespace: collection of names, identified by an URI reference • is needed for extensions • XML schemas of WMO extensions should reside in the http://www.wmo.int/metadata/2006 namespace xmlns:wmo = “http://www.wmo.int/metadata/2006”

  10. So where do we go from WMO Core v0.2? • From the extension rules it’s clear that WMO Core v0.2 has problems • Furthermore, even *before* we extend ISO 19115 we need to identify an XML encoding of the content model • ISO 19115 is an ‘abstract specification’ … • IT specifies what information is required but not how to encode it • We developed our own XML encoding of ISO 19115 – as did many others • Interoperability is impaired by having numerous inconsistent encodings of ISO 19115 …

  11. The ISO standard 19139 • XML schema implementation for ISO 19115 • provides a common specification for describing, validating and exchanging metadata • defines implementation guidelines for general-purpose metadata • includes XML schema implementations of other ISO 191xx series (including ISO 19136 / GML) according to ISO 19118 ‘Geographic Information – Encoding’ • fulfils 99% of the needs for a WMO metadata standard

  12. WMO metadata profile – key elements • Whilst ISO 19139 resolves a ‘large proportion’ (99%?) of metadata-related concerns from the WMO community … there are still a number of outstanding elements worthy of discussion: • Time • Internationalisation / multilingual support • Codelists and keywords • Service metadata • Catalogues – incl. Feature Catalogues

  13. The future? • Whilst it is recommended that we adopt ISO 19139 as the ‘default’ encoding for WMO Core metadata … • We must ensure that WMO Core is fit for purpose within the WMO community – • where the existing ISO standard information models and encodings are not appropriate we should *adapt* them to our needs (e.g. ISO 19111 and parametric spatial referencing systems) • The ISO standards are still in flux; so long as we *engage* with ISO/TC 211 we can adapt the standards to suit WMO • This ‘profile’ is a *big* step forward, but it should not be considered a final version … it will continue to evolve

  14. ISO 19118 Geographic Information - Encoding • ISO 19118 “Geographic information – Encoding” specifies rules for encoding geographic information … i.e. converting from the UML model to XML schema • ISO 19139 Clause 8 presents a good ‘summary’ in relation to encoding the ISO 19115 information model

  15. ISO 19139 – Encoding summary (1) • Each UML class is encoded into 3 XML constructs: • XML Class Type (XCT) • describing the content (attributes) of the UML class • XML Class Global Element (XCGE) • ensuring the class has global visibility within the XML schema for ‘import’ etc. • XML Class Property Type • containment of a class is managed through the XML Class Property Type of it’s data type, enabling both “by Value” and “by Ref” implementations of content • Note: when an XCT is of xs:simpleType, “by Ref” is not permitted

  16. ISO 19139 – Encoding summary (2) • Naming conventions: • UML class » Class1 • XCT » Class1_Type • XCGE» Class1 • XCPT » Class1_PropertyType • Special case encodings: • Abstract classes • Inheritance and sub-classes Note: to achieve interoperability only ‘extension’ sub-classing (i.e. adding attributes) is permitted – restriction and multiple inheritance are not allowed • Enumerations • Codelists • Unions

  17. Polymorphism and substitution groups (1) • Polymorphism: the ability to assume different forms • Example: [CI_ResponsibleParty] » individualName [CharacterString]could be specialized such that ‘name’ is compartmentalized into first and last names • Polymorphism provides communities with a mechanism to better refine metadata to meet organizational need individualName is extended within a ‘community’ namespace … • whilst still utilizing ISO 19139 schemas (gmd namespace), and • still providing usable & understandable instance documents • In OO, the specialized class “is a” type of the general class; e.g. a dog “is a” type of animal

  18. Polymorphism and substitution groups (2) • The “is a” relationship implies semantic consistency & substitutability • ISO 19118 *only* permits simple (extension-only) sub-classing / specialization • ISO 19139 allows polymorphism primarily though the _PropertyType encodings; e.g. gmd:PT_FreeText_PropertyType is substituted for the more general gco:CharacterString_PropertyType for multi-lingual support • Need to inform the XML parser of the substitution: • via a substitutionGroup directive in the schema, OR • via a xsi:type directive in the instance document

  19. GML pattern: by-value or by-reference (1) • GML properties are defined such that the ‘content’ can be referenced EITHER • ‘by-value’ within the scope of the containing XML element, OR • ‘by-ref’ (xlink) to an instance of the content residing elsewhere; either within the document or external <xs:attributeGroup name="ObjectReference"> <xs:attributeGroup ref="xlink:simpleLink"/> <xs:attribute name="uuidref" type="xs:string"/> </xs:attributeGroup> <xs:attribute name="nilReason" type="gml:NullType"/> <xs:complexType name="ObjectReference_PropertyType"> <xs:sequence/> <xs:attributeGroup ref="gco:ObjectReference"/> <xs:attribute ref="gco:nilReason"/> </xs:complexType>

  20. GML pattern: by-value or by-reference (2) • Example - from an “observations & measures” GML instance <om:location xlink:href="#ot2p"/> <om:observedProperty xlink:href="urn:x-ogc:def:phenomenon:OGC:Species"/> <om:featureOfInterest> <om:Station gml:id="ot2s"> <gml:name>8903</gml:name> <om:position> … </om:position> </om:Station> </om:featureOfInterest> by-reference by-value

  21. «xlink» semantics <attributeGroup name="simpleLink"> <attribute name="type" type="string" fixed="simple" form="qualified"/> <attribute ref="xlink:href" use="optional"/> <attribute ref="xlink:role" use="optional"/> <attribute ref="xlink:arcrole" use="optional"/> <attribute ref="xlink:title" use="optional"/> <attribute ref="xlink:show" use="optional"/> <attribute ref="xlink:actuate" use="optional"/> </attributeGroup> • How should parsers interpret xlink? • XML validation will *only* check the grammar of the xlink statement • xlink has ‘deferred binding’ semantics …

  22. GML pattern: object-type-object-type ‘striping’ object <om:location xlink:href="#ot2p"/> <om:observedProperty xlink:href="urn:x-ogc:def:phenomenon:OGC:Species"/> <om:featureOfInterest> <om:Station gml:id="ot2s"> <gml:name>8903</gml:name> <om:position> <gml:Point gml:id="ot2p"> <gml:pos srsName="urn:x-ogc:def:crs:EPSG:6.3:62836405"> -30.7025065 134.1997256 </gml:pos> </gml:Point> </om:position> </om:Station> </om:featureOfInterest> object object type object object type object type object type object

  23. Interoperability outside WMO community • Users outside WMO community will (probably) NOT understand WMO extensions … • Need to ‘remove’ them??? Document with WMO extensions XSL Transformation Document without WMO extensions

  24. Open Issues • Worked examples & implementation test-bed • Finalize extensions • which are required? Time, ServiceIdentification etc. • encode as ISO 19118-compliant XML schema • ISO 19115 / 19139 have a multitude of options for describing information … which makes writing parsing applications complex • do we need to *restrict* WMO Core metadata? • ISO/TC 211 & OGC community peer review

  25. Thank you

  26. Explanation of times for observations and simulations Encoding in ISO 19115 / 19139 metadata April 2006

  27. Times relevant to a observations… a single observation … Time at which the observation ‘event’ occurred (from om:Event) (Often referred to as “validTime” within GML schemas) Time at which the observation was published / issued … allowing for amendments to a previously published observation (we need to extend O&M for this) issueTime (ti) eventTime (te) Observation Time in the “real world” Default temporal reference system: ISO-8601 (Gregorian calendar) real time axis (t) eventTime encoded in ‘Observations & Measures’ as: <om:Observation gml:id="obsTest1"> … <om:time> <gml:TimeInstant gml:id="ot1t"> <gml:timePosition>2005-01-11T16:22:25.00</gml:timePosition> </gml:TimeInstant> </om:time> … </om:Observation>

  28. Times relevant to a observations… what about accumulations? … issueTime (ti) eventTime (te) Time interval relative to the observation time for calculating accumulations, averages etc. time interval real time axis (t) An ‘instantaneous’ observation is conceptually simple; e.g. measuring temperature at a specific instant in time However, what about accumulations (or averages etc.) … e.g. measuring a 3-hr average temperature? We assert that you can’t evaluate the average until the end of the 3-hr period. We assert that the time interval is a property of the measured phenomena (a 3-hr average) & the eventTime (te) of observation occurs at the end of the time interval.

  29. Times relevant to a observations… what about maxima & minima? … Time instant at which the maximum temperature occurs during the time interval Related observation: temperature recorded at time instant eventTime (te) Primary observation: Maximum temperature recorded during time interval x Temperature fluctuation during time interval real time axis (t) time interval The primary observation (in this example) is the maximum temperature during a time interval. The maximum temperature cannot be evaluated until the end of the time interval. We assert that the time interval is a property of the measured phenomena (a daily maximum) & the eventTime (te)time of observation occurs at the end of the time interval. In many cases, the exact instance when the maximum temperature occurred cannot be determined. However, automated weather stations now offer near-continuous measurement & can record the time instant that an event (i.e. the maximum temperature) occurred. We assert that the maximum temperature ‘event’ can be modelled as a related observation.

  30. Times relevant to a observations… what about collections of obs? … collectionPeriod eventTime (te) real time axis (t) collectionPeriod is the time interval bounding all discrete observations within the collection (note: this would probably be encoded as an element of the spatio-temporal bounding box for the collection)

  31. Which time values are needed in observation feature types? Time properties: eventTime (te) Time at which the observation event occurred Not valid: time interval (for calculating accumulations, averages, max/min etc.) issueTime (ti) Time at which the observation was published … allowing for amendments part of the ‘phenomenon’ definition collectionPeriod Time interval bounding all discrete observations within the collection (where appropriate)

  32. Time metadata in ISO 19115 (1)

  33. Time metadata in ISO 19115 (2)

  34. Encoding eventTime in ISO 19115 & ISO 19139 <MD_Metadata> <identificationInfo> <MD_DataIdentification> <extent> <EX_Extent> <temporalElement> <EX_TemporalExtent> <extent> <gml:TimeInstant> <gml:timePosition> 2005-01-11T16:22:25.00 </gml:timePosition> </gml:TimeInstant> </extent> </EX_TemporalExtent> </temporalElement> </EX_Extent> </extent> </MD_DataIdentification> </identificationInfo> </MD_Metadata> • For a single observation, or a collection of observations with a common eventTime (te) … • metadata [MD_Metadata] » • identificationInfo [MD_DataIdentification] » • extent [EX_Extent] » • temporalElement [EX_TemporalExtent] » • extent [TM_Instant]

  35. Encoding collectionPeriod in ISO 19115 & ISO 19139 <MD_Metadata> <identificationInfo> <MD_DataIdentification> <extent> <EX_Extent> <temporalElement> <EX_TemporalExtent> <extent> <gml:TimePeriod> <gml:beginPosition> 2006-01-11T16:22:25.00 </gml:beginPosition> <gml:endPosition> 2006-01-12T16:22:25.00 </gml:endPosition> </gml:TimePeriod> </extent> </EX_TemporalExtent> </temporalElement> </EX_Extent> </extent> </MD_DataIdentification> </identificationInfo> </MD_Metadata> For a collection of observations spanning multiple times, collectionPeriod … metadata [MD_Metadata] » identificationInfo [MD_DataIdentification] » extent [EX_Extent] » temporalElement [EX_TemporalExtent]» extent [TM_Period]

  36. Encoding issueTime in ISO 19115 & ISO 19139 • issueTime … • metadata [MD_Metadata] » • dateStamp [Date] • metadata [MD_Metadata] » • identificationInfo [MD_DataIdentification] » • citation [CI_Citation] » • date [CI_Date]» • date [Date] • dateType [CI_DateTypeCode] • edition [CharacterString] • {creation, publication, revision} • editionDate [Date] • Problem: • date field only … • in meteorology we need to differentiate on smaller time scales • EITHER • extend metadata standard, OR • use ‘edition’ information to capture metadata information for amends etc. <MD_Metadata> <dateStamp> <gco:Date>2006-02-20</gco:Date> </dateStamp> <identificationInfo> <MD_DataIdentification> <citation> <CI_Citation> <date> <CI_Date> <date> <gco:Date>1993-01-01</gco:Date> </date> <dateType> <CI_DateTypeCode codeList="codeList.xml?CI_DateTypeCode" codeListValue="creation">creation</CI_DateTypeCode> </dateType> </CI_Date> </date> </CI_Citation> </citation> </MD_DataIdentification> </identificationInfo> </MD_Metadata>

  37. Indeterminate times in GML – ISO 19136 • Inexact temporal positions may be expressed using the optional indeterminatePosition attribute. This • takes a value from an enumeration defined as follows: • <simpleType name="TimeIndeterminateValueType"> • <restriction base="string"> • <enumeration value="after"/> • <enumeration value="before"/> • <enumeration value="now"/> • <enumeration value="unknown"/> • </restriction> • </simpleType> • These values are interpreted as follows: •  “unknown” indicates that no specific value for temporal position is provided. •  “now” indicates that the specified value shall be replaced with the current temporal position whenever the value is accessed. •  “before” indicates that the actual temporal position is unknown, but it is known to be before the specified value. •  “after” indicates that the actual temporal position is unknown, but it is known to be after the specified • value. • <gml:TimeInstant> • <gml:timePosition indeterminatePosition="now"/> • </gml:TimeInstant>

  38. Times relevant to a simulation… time in the real world … Time at which the simulation was executed The ‘origin’ (datum) time used in the simulation Time of an ‘event’ that has been simulated … i.e. a ‘simulated’ observation eventTime (te) datumTime (td) creationTime (tc) issueTime (ti) Time at results of the simulation were published / issued x time axis (t)

  39. Times relevant to a simulation… assimilation vs. prediction … • Consider a forecast … • the simulation is run to give a prediction from the datumTime (td) to some arbitrary time in the future • however, the simulation begins earlier at the initialisationTime (t0)… allowing data to be assimilated into the simulation • the analysis is produced at the end of the assimilation period • the analysisTime (ta) corresponds with the datumTime (td) datumTime (td) or analysisTime (ta) eventTime (te) x time axis (t) assimilation period initialisationTime (t0)

  40. Times relevant to a simulation… usage periods … eventTime (te) datumTime (td) validUsagePeriod may extend beyond the eventTime (te) usageExpiryTime validUsagePeriod usageStartTime x time axis (t) validUsagePeriod is the interval during which the results of the simulation should be used. Most often, simulation results do not have a valid usage period. However, we must include the concept as it is required for TAFs Whilst not obvious, the issueTime (ti) and the start of the usage period do not need to coincide

  41. A set of results from a simulation (a ‘model run’) collectionPeriod eventTime (te) datumTime (td) finalTime (tf) Collection of discrete simulation result-sets that share the same DatumTime End of the simulation … finalTime (tf) x time axis (t)

  42. Times relevant to a simulation… the axes of time for simulations … Time in the “conceptual world” of the numerical simulation Temporal coordinate reference system defined locally for the simulation using <gml:TimeCoordinateSystem> eventTime (te) datumTime (td) real time axis (t) simulationDatumTime (Td) Time in the “real world” Default temporal reference system: ISO-8601 (Gregorian calendar) The origin time for the simulation; e.g. T+0 simulationTime (Ts) simulation time (T) is ‘projected’ onto real time (t) … although this is not always straightforward with 360-day calendars for climate models! Example: T+60 simulation time axis (T)

  43. Example encoding of a local temporal reference system Temporal coordinate reference system (hours since midnight UTC 9th March 2006) encoded as: <gml:TimeCoordinateSystem gml:id=“simTRS1"> <gml:description>Number of hours since midnight UTC, 9th March 2006</gml:description> <gml:name>Simulation time axis</gml:name> <gml:domainOfValidity>global</gml:domainOfValidity> <gml:origin>2006-03-09T00:00:00.00</gml:origin> <gml:interval>H</gml:interval> </gml:TimeCoordinateSystem> Time instant “T+60” (relating to local temporal reference system) encoded as: <gml:TimeInstant> <gml:timePosition frame=“#simTRS1">60</gml:timePosition> </gml:TimeInstant>

  44. Which time values are needed in simulation ‘metadata’?(at least for operational meteorology!) initialisationTime (t0) datumTime (td) or analysisTime (ta) creationTime (tc) issueTime (ti) ‘publication’ metadata usageStartTime ‘simulation’ metadata validUsagePeriod usageExpiryTime eventTime (te) primary metadata simulatedTime (Ts) finalTime (tf) collectionPeriod

  45. Encoding primary time metadata in ISO 19115 & ISO 19139 (1) <EX_Extent> <temporalElement> <EX_TemporalExtent> <extent> <gml:TimeInstant> <gml:timePosition> 2005-01-11T16:22:25.00 </gml:timePosition> </gml:TimeInstant> </extent> </EX_TemporalExtent> </temporalElement> <temporalElement> <EX_TemporalExtent> <extent> <gml:TimeInstant> <gml:timePosition frame=“#simTRS1"> 60 </gml:timePosition> </gml:TimeInstant> </extent> </EX_TemporalExtent> </temporalElement> </EX_Extent> eventTime (te) simulatedTime (Ts) collectionPeriod Encode as for observations … except you *may* want to use two temporal extent definitions to describe time instants in both ‘Gregorian’ and ‘simulated’ reference frames

  46. Encoding primary time metadata in ISO 19115 & ISO 19139 (2) • Problem: • it *may* be difficult for parsers to differentiate between multiple temporal extents; e.g. when attempting to build an index on the content of the metadata record • there is no convenient placeholder for the local temporal reference system definition • it *may* be necessary to extend the ISO 19115 metadata standard to allow alternate time extents to be specified, along with their local reference system definition • use a ‘similar’ pattern to expressing multi-lingual alternatives for character strings … eventTime (te) simulatedTime (Ts)

  47. Encoding ‘publication’ time metadata in ISO 19115 & ISO 19139 usageStartTime creationTime (tc) validUsagePeriod issueTime (ti) usageExpiryTime Encode as for observations metadata [MD_Metadata] » identificationInfo [MD_DataIdentification] » citation [CI_Citation] » date [CI_Date]» date [Date] dateType [CI_DateTypeCode] edition [CharacterString] {creation, publication, revision} editionDate [Date] employing the CI_DateTypeCode to describe the ‘type’ of the publication Encode validUsagePeriod as <gml:TimePeriod> <validUsagePeriod> <gml:TimePeriod> <gml:beginPosition> 2006-01-11T16:22:25.00 </gml:beginPosition> <gml:endPosition> 2006-01-12T16:22:25.00 </gml:endPosition> </gml:TimePeriod> </validUsagePeriod>

  48. Temporal usage constraints – extension required to ISO 19115? • ISO 19115 defines constraint information; the metadata required for managing rights to information including restrictions on access and use. • The metadata entity seems the most appropriate placeholder for temporal usage constraints … however, ISO 19115 only details: • MD_LegalConstraints • MD_SecurityConstraints To record temporal usage constraints in the metadata record, we will need to extend ISO 19115; e.g. WMO_TemporalUsageConstraints Question: is this type of metadata required in WMO core?

  49. Encoding ‘simulation’ time metadata in ISO 19115 & ISO 19139 These metadata elements are artefacts of the numerical simulation itself. Furthermore, there are potentially many more metadata elements which *may* prove useful to discriminate between datasets (e.g. ensembleID) The British Atmospheric Data Centre (BADC) are working on an ISO 19115 extension to define metadata about numerical simulations: http://proj.badc.rl.ac.uk/ndg/wiki/NumSim initialisationTime (t0) datumTime (td) or analysisTime (ta) finalTime (tf) … ensembleID

  50. NDG NumSim extension Numerical Simulation Discovery Metadata (aka NumSim) The DIF describes datasets at the discovery level, but where simulations are involved, discovery metadata needs more information than is available in existing schema. A new schema which is being trialled at the BADC should be accessible to both DIF and ISO19115 parent discovery schema, although at the moment it is rather standalone. … Proposal: WMO adopts NumSim as a formal mechanism for describing simulation metadata within WMO Core metadata profile. Prior to adoption, WMO should liaise with BADC to ensure the the profile is fit for purpose and has been rigorously tested.

More Related