240 likes | 377 Views
Trends Project Spaghetti & Linguine (aka Trends Data Store). Mark Servilla servilla@lternet.edu 14 September 2006. Table of Contents. Background System Architecture System Workflow and Architecture Details Demonstration Screen Examples. Message from IMExec - Feb 2006.
E N D
Trends ProjectSpaghetti & Linguine(aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September 2006
Table of Contents • Background • System Architecture • System Workflow and Architecture Details • Demonstration Screen Examples LNO NIS
Message from IMExec - Feb 2006 • “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.” • “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.” • “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.” • “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.” LNO NIS
Prerequisites • Site data is documented with “rich” and “complete” EML • Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date” • Site data is open and accessible through a standard protocol such as HTTP • Site EML documents are harvested on a regular basis into the LTER Metacat LNO NIS
What is EML? Ecological Metadata Language is… • An ecological metadata standard • Very extensible; it can be used to describe many different types of data • Comprehensive and supports a rich set of constructs to fully describe data including • how to access distributed data • its logical and physical structure • Defined by an XML Schema • For further information: • http://knb.ecoinformatics.org/software/eml/ LNO NIS
What is Metacat? Metacat is… • A storage system for metadata and data (optimized for use with EML) • Built on top of relational database system using Java servlets • Requires metadata to be in XML format • Provides a customizable web interface • Support point-to-point replication • For further information: • http://knb.ecoinformatics.org/software/metacat/ LNO NIS
-Derived Metadata - Source Provenance - Integration Methods - Trends Contact Trends Data Store Architecture EML TrendsMetadata Source A Metacat/ Harvester EML Factory EML.xml Source B HTML Store Front f(x) EML Parser/ Loader 2 ̊ 1 ̊ SOAP Source C Secondary Database (derived data) Data Integration/ Transformation Primary Database (source data) Dataset Registry Trends Data Warehouse LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
Decomposed Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
Time-series data Physical environment (e.g., climate, …) Human population and economy Biogeochemistry Biotic structure Data/metadata Relational Database Spreadsheet Text file HTML/XML LTER Site Data Collection LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
EML Package ID knb-lter-site.XX.YY knb-lter-sev.354.1 knb-lter-sev.354.2 knb-lter-sev.354.3 Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted Harvester is a time-based update process to “pull” site EML and inserts into Metacat EML Source A Metacat/ Harvester Source B Source C EML, Metacat, and the Harvester “independent of the Trends Project” LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
Dataset registry identifies Trends data in Metacat New revisions assert a “new” data load. The EML parser/loader Translates the site EML into the RDBMS DDL Creates a new DB table in the primary database based on the revision Loads the new data into the primary database Trigger to continue workflow EML Loader/Parser EML Source A Metacat/ Harvester Source B EML Parser/ Loader 1 ̊ Source C Dataset Registry LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
f(x) 2 ̊ 1 ̊ Data Transformation • Primary DB (1°) stores site data in native schema • Transformation module reads native schema, performs transformation/integration, and writes to global schema • Secondary DB (2°) stores derived data in consistent global schema “triggered by data load” … LNO NIS
Global Schema revision scope knb_eco_trends_1_1 identifier LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
EML -Derived Metadata - Source Provenance - Integration Methods - Trends Contact TrendsMetadata Metacat/ Harvester EML Factory EML.xml 2 ̊ EML for the “derived” • EML Factory generates EML metadata for the derived data and inserts into Metacat • Derived data is now accessible through the Metacat user interface LNO NIS
Generalized Workflow • Sites collect and document time-series data (e.g., climate, social-economics, …) • Sites update EML with a new revision • EML is harvested into Metacat • EML Loader/Parser loads new/updated dataset into primary database • Data integration/transformation converts “raw” data into “derived” data • Derived data is stored in secondary database • EML is generated for derived data and is stored in Metacat • Derived data is made available to store front LNO NIS
Store Front provides API to derived data products in secondary DB HTML – today Web service – tomorrow Issues: Authentication Authorization Provenance Quality Interactive Plots HTML Store Front 2 ̊ SOAP Store Front http://fire.lternet.edu/Trends (beta site location) LNO NIS
Step 2 Step 5 Step 1 -Derived Metadata - Source Provenance - Integration Methods - Trends Contact Step 4 Step 6 Step 3 Animated Workflow EML TrendsMetadata Source A Metacat/ Harvester EML Factory EML.xml Source B HTML Store Front f(x) EML Parser/ Loader 2 ̊ 1 ̊ SOAP Source C Dataset Registry LNO NIS
Thank You – The End LNO NIS