1 / 45

3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update. Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP). Summary of Past Quarter Activities.

martha
Download Presentation

3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP)

  2. Summary of Past Quarter Activities • Researched and experimented with large data transfer technologies (iRODS, Globus Connect, etc.) • Configured a large dual RAID array on the primary file server (~20TB) and designed a third RAID array to bring the total storage capacity to 50TB+ • Imported the WestJump source data files onto the primary file server and organized them into a uniform folder structure (meteorology, emissions, results) • Created an FTP site on the primary file server for facilitating direct, basic access to the source data files • Made available the current inventory of source data files on the new FTP site • Began the design of the content, format, and coding protocols for submitting model results and other data to the TSDW • Began the design of the schema and code infrastructure for the “project overview and tracking” system • Continued to refine the database, software, and website infrastructure supporting the data warehouse • Continued to refine various pre-processing components • XML Generator for metadata • Boundary Conditions Generator • CAMx Post Processing Utility • RDBMS data import system • Refined the logical and physical file system design • Refined the data verification and validation system

  3. Operational Website Components • User Login Form • User Registration/Modification Form • User Profile/Account Form • User Feedback Form • Dataset Request Form • Database Query Wizard • Raw Data Download • Interactive Charts • Dynamic Contour Maps • Site Metadata Reports • Monitoring Site Metadata Browser • File Explorer • FTP site Authentication and Authorization System

  4. Possible Future Website Components • Modeled Emissions Summary Tool • Modeled-to-Observed Data Comparison Tool • Air Quality Summary Reports • Visibility • Deposition • Ozone • Other • Model Data Mapping Tool • Source Apportionment Tool • Various Unpublished Monitoring Data Tools • Backend Web Services and Processing Components

  5. Summary of Coming Activities • Conduct additional use case tests • Finalize the large data transfer system • Import preexisting/legacy air quality studies and results • Commence production-level data warehouse operations (hosting, data analysis and processing, maintenance, et cetera) • Design visualization and analysis tools for modeling results and performance evaluation • Design the “project overview and tracking” interface for the TSDW website

  6. TSDW Architecture Diagram - Overview

  7. TSDW Data Flow Diagram - Overview

  8. TSDW Use Cases Definition of "Use Case": A list of steps defining the interactions between a user and a system to achieve a specific goal. The "user" can be a human or an external system, depending on context. Scopes of Use Cases:The subset of users to which the functionality of a given use case is made available Internal: The TSDW administration and development team External: A subset of external users that have been granted a specific role Public: The general public - anyone who visits the TSDW website Potential User Roles: Administrators Project Managers Project Team Members Stakeholders Data Providers Planners Public

  9. Use Case Description • Obtain and Manage Model Input Data (Scope: Internal) • Obtain model input data from data provider(s) • Copy model input data files to file server • Organize model input data on the file server • File and folder naming convention • Physical file system organization (what developers see) • Logical file system organization (what the user sees) • Dataset partitioning (temporal, spatial, functional, etc.) • Perform periodic backup of "active" model input data • Perform periodic archival of "inactive" model input data • Track and manage the versioning of the model input data

  10. Use Case Description • Harvest File Metadata Using the XML Metadata Generator (Scope: Internal) • An administrator locates the desired root folder in the file system • An administrator executes the XML Generator program to produce XML files containing file metadata • (Ideally, the above two tasks could be automatically run as a "cron" task on a regular, periodic basis, rather than as a two-step manual process.) • The File Indexing Utility (FIU) processes the newly-generated files to extract the relevant file metadata • The FIU updates the RDBMS with the file metadata • The new file metadata is automatically reflected in the TSDW File Explorer Tool • Dependencies: • The XML File Metadata Generator program • The File Indexing Utility (FIU) • The appropriate RDBMS schema, SQL scripts, and software libraries for managing source file metadata

  11. Use Case Description • Download Model Input Data from TSDW, Online Method (Scope: External) • User logs into the TSDW website • User fills out the Dataset Request form • The user is redirected to the Dataset Request confirmation message/page • The DR form is passed to the Dataset Packaging System (DPS) • The DPS registers metadata about the request into the RDBMS • The DPS locates the physical files that are needed to fulfill the order • The DPS assembles, organizes, and compresses the component files into a downloadable "package" • The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle • The DPS registers metadata about the package (including the "PackageID") into the RDBMS • The DPS notifies the requesting user of the package's availability • The user logs back into the TSDW website (if necessary) • The user initiates a session of the Dataset Transfer System (DTS) to download the files • The DTS registers metadata about the package "receipt" into the RDBMS • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • Dataset Request Form • Dataset Request confirmation message/page • Dataset Packaging System (DPS) (could be one-and-the-same with iRODS or Globus) • Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata • Appropriate RDBMS schema for associating Dataset Requests with Users and Projects • A high volume data transfer program such as iRODS or Globus Connect Server

  12. Use Case Description • "Download" Model Input Data from TSDW, Offline Method (Scope: External) • User logs into the TSDW website • User fills out the Dataset Request form • The DR form is passed to the Dataset Packaging System (DPS) • The DPS registers metadata about the request into the RDBMS • The DPS locates the physical files that are needed to fulfill the order • The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle • The DPS registers metadata about the package (including the "PackageID") into the RDBMS • The DPS notifies the requesting user of the order receipt and future hard drive shipment • The DPS sends a list of the files that comprise the order to a TSDW administrator • A TSDW administrator copies the selected files onto a hard disk drive (HDD) or drives • A TSDW administrator mails the drive(s) to the requesting user • A TSDW administrator records the shipment in the RDBMS • Dependencies: • Dataset Request Form • Dataset Request confirmation message/page • Dataset Packaging System (DPS) • Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata • Appropriate RDBMS schema for associating Dataset Requests with Users and Projects • A manual process for copying data files onto hard disks and mailing them to users

  13. Use Case Description • Download Boundary Conditions Generator (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Utilities section of the website • User fills out the Boundary Conditions Generator (BCG) download form • The BCG download form is passed to the Utility Tracking System (UTS) • The UTS extracts information from the metadata file associated with the current BCG • The UTS associates this metadata with the appropriate User record in the RDBMS • The UTS redirects the user to a download link for the BCG • The user downloads the BCG and any associated instructions and configuration files • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • Boundary Conditions Generator (BCG) program • BCG user guide • BCG download form • BCG download confirmation message/page and installation file link • The appropriate RDBMS schema, SQL scripts, and software libraries for managing BCG download metadata

  14. Use Case Description • Download the CAMx Post-Processing Utility (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Utilities section of the website • User fills out the CAMx Post-Processing Utility (CPPU) download form • The CPPU download form is passed to the Utility Tracking System (UTS) • The UTS extracts information from the metadata file associated with the current CPPU • The UTS associates this metadata with the appropriate User record in the RDBMS • The UTS redirects the user to a download link for the CPPU • The user downloads the CPPU and any associated instructions and configuration files • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • CAMx Post-Processing Utility (CPPU) program • CPPU user guide • CPPU download form • CPPU download confirmation message/page and installation file link • The appropriate RDBMS schema, SQL scripts, and software libraries for managing CPPU download metadata

  15. Use Case Description • Upload Model Results (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Results Upload section of the website • User fills out the Modeling Results Upload form • User provides a standard description of the model results • User provides the "Package ID" of the model input data used • User provides the Background Conditions Generator "Version ID", if relevant • User provides the CAMx Post-Processing Utility "Version ID", if relevant • User selects the files to upload • User clicks the "Submit" button on the form • The Model Results Upload form is passed to the Data Import System (DIS) • The data files are uploaded and cataloged by the DIS • The DIS creates a unique "DatasetID" that will be linked to this upload throughout its lifecycle • The DIS registers metadata about the upload (including the "DatasetID") into the RDBMS • The DIS notifies the uploading user of the upload success or failure (generally, its "status") • The DIS places the file(s) into the appropriate location(s) on the TSDW file system • The DIS notifies the appropriate TSDW administrator(s) of the upload • Dependencies: • Modeling Results Upload (MRU) form • MRU system • Appropriate RDBMS schema and SQL scripts/commands for managing MRU metadata

  16. Use Case Description • Import Database-Ready Model Results (Scope: Internal) • An administrator locates the newly-imported model results (which have been generated by the CPPU and uploaded to the TSDW) • And administrator executes the appropriate scripts/commands using the Data Import System (DIS) • The DIS reads and imports the database-ready model results into the RDBMS • The DIS verifies that all the necessary metadata is present in the RDBMS • The DIS transforms the data into the appropriate schema for import • The DIS maps source codes and names to internal codes and names, as needed • The DIS imports the data from the source file(s) into the RDBMS • The DIS makes/updates the appropriate metadata records in the RDBMS for tracking the imported model Dataset • The imported model results become automatically available via the relevant tools on the TSDW website • Dependencies: • The CAMx Post-Processing Utility (CPPU) for generating the database-ready model results • The Dataset Import System (DIS) • Appropriate RDBMS schema and SQL scripts/commands for managing Model Results metadata

  17. Use Case Description • Visualize and Analyze Monitoring Data (Scope: External) • User logs into the TSDW website • The user chooses an appropriate visualization and/or analysis tool to use • Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as well as display and formatting options • The tool displays monitoring data in various output products, such as: • Data summary tables • Bar charts • Line charts • Pie charts • Contour maps • Dependencies: • An appropriate collection of monitoring data • Specific design specifications for monitoring data output products • An appropriate collection of online visualization tools and technologies

  18. Use Case Description • Visualize and Analyze Model Results (Scope: External) • The user logs into the TSDW website • The user chooses an appropriate visualization and analysis tool to use • Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as well as display and formatting options • The tool displays model performance and evaluation results in various output products, such as: • Normalized mean error and bias • Mean normalized error and bias • Root mean square error • Correlation coefficients • Soccer plots • Box and whisker plots • Bugle plots • Spatial statistical plots • Spatial concentration plots with observation overlays • Dependencies: • An appropriate collection of model results data • Specific design specifications for model results output products • An appropriate collection of online visualization tools and technologies

  19. Use Case Description • View Project Data and Metadata (Scope: External) • A user logs into the TSDW website • The user navigates to the Projects and Studies section of the TSDW website • The user views metadata associated with the projects that he/she has permission to view • Name, purpose, description • Contact information: project manager(s), contractors, etc. • Associated datasets: Model input data downloaded, model results uploaded, etc. • Analysis products: Charts, graphs, summaries, etc. • The user views data associated with the projects that he/she has permission to view • Model input data • Meteorological inputs • Emissions inputs • Initial and Boundary Conditions • Ancillary inputs (land use, land cover, photolysis) • Model configuration metadata • Model results • Gridded results • Observation-paired results • Monitoring data • Dependencies: • Appropriate RDBMS schema and SQL scripts/commands for managing Project metadata • Projects • Users • Downloaded/Uploaded Datasets • Documents • Analysis products • An online user interface for the Projects and Studies section of the TSDW website

  20. Use Case Summary • Obtain and Manage Model Input Data (Scope: Internal) • Harvest File Metadata Using the XML Metadata Generator (Scope: Internal) • Download Model Input Data from TSDW, Online Method (Scope: External) • "Download" Model Input Data from TSDW, Offline Method (Scope: External) • Download Boundary Conditions Generator (Scope: External) • Download the CAMx Post-Processing Utility (Scope: External) • Upload Model Results (Scope: External) • Import Database-Ready Model Results (Scope: Internal) • Visualize and Analyze Monitoring Data (Scope: External) • Visualize and Analyze Model Results (Scope: External) • View Project Data and Metadata (Scope: External)

  21. Thanks.

  22. Review of the 3SDW Overall System Ecosystem and Architecture

  23. User Login Form

  24. User Registration/Modification Form

  25. User Profile/Account Form

  26. User Feedback Form

  27. Dataset Request Form

  28. Raw Data Download (Query Wizard)

  29. Time Series Charts (Query Wizard)

  30. Dynamic Contour Maps (Query Wizard)

  31. Site Metadata Report (Query Wizard)

  32. Monitoring Site Browser

  33. Modeled Emissions Summary Tool

  34. Modeled-to-Obs Comparison Tool

  35. Air Quality Summary Reports

  36. Model Data Mapping Tool

  37. Future Online Visualization Tools

  38. First TSDW Modeling Use Case Report and Results

  39. First Use Case - Beta Test Steps • Testers visited the TSDW website and registered with the system to create an account • Testers visited the Data Request web page and entered their requests for the WestJump Base08b dataset • Each request was stored in the database • The system determined whether or not each request could be automatically filled or had to be manually assembled • The system sent emails to the appropriate TSDW team members to notify them of the data requests • TSDW team members assembled the dataset requests (copied the relevant data files onto hard drives) • The datasets (hard drives) were delivered to the beta testers • The system updated the dataset requests to reflect their “filled” status

  40. First Use Case - Beta Test Steps (cont’d) • Using the delivered datasets, testers ran the models and generated results • Testers returned the model output results to the 3SDW • The test results were assessed by TSDW team members • Testing outcomes were summarized for the May 3-State AQ Study Technical Workshop • The TSDW team refines the dataset ordering, download, packaging, and delivery system according to lessons learned • The TSDW team develops the nextUse Case testing scenario(s)

  41. Summer (June – October) 2013 3SAQS Technical Work Review Data Warehouse Activities

  42. Summary of Coming Activities • Implement the collaborative components of the warehouse • Implement the ongoing news and updates section • 3SDW on-line for NEPA air quality analysis projects by end of October • Out-bound data delivery and in-bound data ingestion for NEPA and other air quality studies • Data warehouse operations (hosting, data analysis and processing, maintenance, et cetera) • Plans for storage/access/visualization for modeling results and evaluation tools • Store UT BLM ARMS and other studies’ data in 3SDW after evaluation using protocols

  43. Testing and Refinement Help • All users, collaborators, and partners can help with testing • Please report bugs – don’t endure them • Use the website Feedback form • Send direct email to team members • Provide as much information as possible up-front • Stay abreast of ongoing additions and updates • Be an active part of the design process - make suggestions for features and refinements • Don’t assume it can’t be done • Don’t assume it can be done

More Related