1 / 19

NCDC Station Metadata System

NCDC Station Metadata System. Jeff Arnfield Active Archive Branch National Climatic Data Center Asheville, NC. NCDC’s Role. Nation’s focal point and scorekeeper for information about weather and climate variations and changes

carina
Download Presentation

NCDC Station Metadata System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCDC Station Metadata System Jeff Arnfield Active Archive Branch National Climatic Data Center Asheville, NC

  2. NCDC’s Role • Nation’s focal point and scorekeeper for information about weather and climate variations and changes • Currently ingest and archive 150 terabytes (150,000 gigabytes) of data every year • Maintain the metadata necessary to interpret these data NCDC - scorekeeper for the nation's climate

  3. Metadata: Data About Data • Information necessary to describe and interpret collections of data and the observing systems that report them • Station location • Station configuration • Observing standards • Reporting protocols • Data inventories • Data product documentation • History of changes over time NCDC - scorekeeper for the nation's climate

  4. Current Observing Systems • NWS cooperative observing system • National surface observing system • Upper air observing system • Climate Reference Network (CRN) • Marine observing system • Profiler • Precipitation observing network • Radar • Global Climate Observing System • Satellite systems • Special national/international experiments NCDC - scorekeeper for the nation's climate

  5. Why Is Metadata Important? • Critical to NCDC's ingest, archive and access systems • Gives data users perspective on reported data values • Station moves & changes to surroundings • Sensor changes • Quality control algorithms • Helps in selection of stations, data products for study NCDC - scorekeeper for the nation's climate

  6. The Rest Is Just Metadata 42 NCDC - scorekeeper for the nation's climate

  7. 2000: Need and opportunity meet • Reviewed metadata holdings and systems • Strengths • Shortcomings • Opportunities for improvement • Climate Database Modernization Program (CDMP) • Partnership with private industry • Increase digital data holdings • Improve database quality • Improve access to and utilization of data NCDC - scorekeeper for the nation's climate

  8. Reality Can Be Ugly NCDC - scorekeeper for the nation's climate

  9. Initial Situation Metadata distributed in a combination of formal and ad hoc systems Different systems may contain information about the same station Multiple sources and procedures for the same metadata result in discrepancies Data freshness and accuracy vary Updates may not affect all similar data NCDC - scorekeeper for the nation's climate

  10. The Problem of Scope • The human mind is so complex and things are so tangled up with each other that, to explain a blade of straw, one would have to take to pieces an entire universe. . . . A definition is a sack of flour compressed into a thimble. • Rémy de Gourmont (1858–1915) NCDC - scorekeeper for the nation's climate

  11. Existing Station History Systems • SHIPS – Station History Information Production System • Highly detailed, with heavy quality control • Coop, ASOS and some surface weather observing sites • UNIX on old Sun workstation • Empress database • Other ad hoc and project-specific systems • Database • Flat text files, word processing files • Many paper records • Some are essentially static • Access via lists, Cliserv and Web CliServ NCDC - scorekeeper for the nation's climate

  12. Shortcomings of SHIPS system • Developed to meet Coop data ingest and publication needs • Database design not normalized • New networks may require structure changes • Lack of keys, data integrity checks • Cumbersome interface with limited queries • No “query only” option, no outside access • Ad hoc queries are complex NCDC - scorekeeper for the nation's climate

  13. Challenges • Technical, cultural and logistical • Metadata conflicts and inconsistencies • Complex table key, versioning, attribution • Security and audit requirements • Informal knowledge base, imprecise terms • Loose system documentation • Geographically dispersed team • Resource competition NCDC - scorekeeper for the nation's climate

  14. Metadata Project Goals • Strategic architecture to manage metadata • Leverage CDMP project tasks and resources • Accommodate imperfect, real world metadata • Accept new information without modification • Flexible queries for dispersed users • Modular for multi-organization participation • Deliver useful releases within a year NCDC - scorekeeper for the nation's climate

  15. Technological Foundation • Normalized relational database • Oracle 8i database, CASE design tools • Model entire subject, not one instance • Surrogate keys minimize dependencies • Enforce business rules in database • Declarative, triggers, stored procedures • Accept flawed data, identify and correct • Separate database, application servers NCDC - scorekeeper for the nation's climate

  16. Technological Foundation (Cont’d) • Similar query needs for research and maintenance • Web-based solution • Distributed access • Easy administration, maintenance • Standard interface minimizes training • ColdFusion web-based environment • Crystal Reports for flexible output NCDC - scorekeeper for the nation's climate

  17. Station History Subject Areas • Identity • Names • IDs • Period of record • Location • Lat/Lon, elevation • Geographic descriptors • Exposure, topography • Classification • Observers • Equipment • Observing Practices • Phenomena • Schedule • Reporting protocols • Data programs • Administration • Supporting documents • Forms • Photos NCDC - scorekeeper for the nation's climate

  18. Current Status • Modeling and requirements workshops held • Contractors familiarized with subject • Hardware, development software installed • SHIPS ported to Oracle, Web-accessible • Anomalies being identified and corrected • First cut database design completed • Interface prototyping in progress • Testing NWS Coop metadata acquisition NCDC - scorekeeper for the nation's climate

  19. 2001: A Metadata Odyssey • Design document complete – 1st Qtr • Physical DB design complete – 1st Qtr • Initial release of new system – 2nd Qtr • Automated Coop QC, ingest – 3rd Qtr • Access to document images – 3rd Qtr • Intensive manual QC, updates – 4th Qtr • Merge Pre-1948, Pre-1890 coop– 4th Qtr NCDC - scorekeeper for the nation's climate

More Related