1 / 24

Trends Project Spaghetti & Linguine (aka Trends Data Store)

Trends Project Spaghetti & Linguine (aka Trends Data Store). Mark Servilla servilla@lternet.edu 14 September 2006. Table of Contents. Background System Architecture System Workflow and Architecture Details Demonstration Screen Examples. Message from IMExec - Feb 2006.

cian
Download Presentation

Trends Project Spaghetti & Linguine (aka Trends Data Store)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trends ProjectSpaghetti & Linguine(aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September 2006

  2. Table of Contents • Background • System Architecture • System Workflow and Architecture Details • Demonstration Screen Examples LNO NIS

  3. Message from IMExec - Feb 2006 • “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.” • “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.” • “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.” • “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.” LNO NIS

  4. Prerequisites • Site data is documented with “rich” and “complete” EML • Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date” • Site data is open and accessible through a standard protocol such as HTTP • Site EML documents are harvested on a regular basis into the LTER Metacat LNO NIS

  5. What is EML? Ecological Metadata Language is… • An ecological metadata standard • Very extensible; it can be used to describe many different types of data • Comprehensive and supports a rich set of constructs to fully describe data including • how to access distributed data • its logical and physical structure • Defined by an XML Schema • For further information: • http://knb.ecoinformatics.org/software/eml/ LNO NIS

  6. What is Metacat? Metacat is… • A storage system for metadata and data (optimized for use with EML) • Built on top of relational database system using Java servlets • Requires metadata to be in XML format • Provides a customizable web interface • Support point-to-point replication • For further information: • http://knb.ecoinformatics.org/software/metacat/ LNO NIS

  7. -Derived Metadata - Source Provenance - Integration Methods - Trends Contact Trends Data Store Architecture EML TrendsMetadata Source A Metacat/ Harvester EML Factory EML.xml Source B HTML Store Front f(x) EML Parser/ Loader 2 ̊ 1 ̊ SOAP Source C Secondary Database (derived data) Data Integration/ Transformation Primary Database (source data) Dataset Registry Trends Data Warehouse LNO NIS

  8. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  9. Decomposed Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  10. Time-series data Physical environment (e.g., climate, …) Human population and economy Biogeochemistry Biotic structure Data/metadata Relational Database Spreadsheet Text file HTML/XML LTER Site Data Collection LNO NIS

  11. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  12. EML Package ID knb-lter-site.XX.YY knb-lter-sev.354.1 knb-lter-sev.354.2 knb-lter-sev.354.3 Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted Harvester is a time-based update process to “pull” site EML and inserts into Metacat EML Source A Metacat/ Harvester Source B Source C EML, Metacat, and the Harvester “independent of the Trends Project” LNO NIS

  13. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  14. Dataset registry identifies Trends data in Metacat New revisions assert a “new” data load. The EML parser/loader Translates the site EML into the RDBMS DDL Creates a new DB table in the primary database based on the revision Loads the new data into the primary database Trigger to continue workflow EML Loader/Parser EML Source A Metacat/ Harvester Source B EML Parser/ Loader 1 ̊ Source C Dataset Registry LNO NIS

  15. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  16. f(x) 2 ̊ 1 ̊ Data Transformation • Primary DB (1°) stores site data in native schema • Transformation module reads native schema, performs transformation/integration, and writes to global schema • Secondary DB (2°) stores derived data in consistent global schema “triggered by data load” … LNO NIS

  17. Global Schema revision scope knb_eco_trends_1_1 identifier LNO NIS

  18. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  19. EML -Derived Metadata - Source Provenance - Integration Methods - Trends Contact TrendsMetadata Metacat/ Harvester EML Factory EML.xml 2 ̊ EML for the “derived” • EML Factory generates EML metadata for the derived data and inserts into Metacat • Derived data is now accessible through the Metacat user interface LNO NIS

  20. Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS

  21. Store Front provides API to derived data products in secondary DB HTML – today Web service – tomorrow Issues: Authentication Authorization Provenance Quality Interactive Plots HTML Store Front 2 ̊ SOAP Store Front http://fire.lternet.edu/Trends (beta site location) LNO NIS

  22. HTML Store Front(evolution in progress) LNO NIS

  23. Step 2 Step 5 Step 1 -Derived Metadata - Source Provenance - Integration Methods - Trends Contact Step 4 Step 6 Step 3 Animated Workflow EML TrendsMetadata Source A Metacat/ Harvester EML Factory EML.xml Source B HTML Store Front f(x) EML Parser/ Loader 2 ̊ 1 ̊ SOAP Source C Dataset Registry LNO NIS

  24. Thank You – The End LNO NIS

More Related