450 likes | 667 Views
3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update. Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP). Summary of Past Quarter Activities.
E N D
3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP)
Summary of Past Quarter Activities • Researched and experimented with large data transfer technologies (iRODS, Globus Connect, etc.) • Configured a large dual RAID array on the primary file server (~20TB) and designed a third RAID array to bring the total storage capacity to 50TB+ • Imported the WestJump source data files onto the primary file server and organized them into a uniform folder structure (meteorology, emissions, results) • Created an FTP site on the primary file server for facilitating direct, basic access to the source data files • Made available the current inventory of source data files on the new FTP site • Began the design of the content, format, and coding protocols for submitting model results and other data to the TSDW • Began the design of the schema and code infrastructure for the “project overview and tracking” system • Continued to refine the database, software, and website infrastructure supporting the data warehouse • Continued to refine various pre-processing components • XML Generator for metadata • Boundary Conditions Generator • CAMx Post Processing Utility • RDBMS data import system • Refined the logical and physical file system design • Refined the data verification and validation system
Operational Website Components • User Login Form • User Registration/Modification Form • User Profile/Account Form • User Feedback Form • Dataset Request Form • Database Query Wizard • Raw Data Download • Interactive Charts • Dynamic Contour Maps • Site Metadata Reports • Monitoring Site Metadata Browser • File Explorer • FTP site Authentication and Authorization System
Possible Future Website Components • Modeled Emissions Summary Tool • Modeled-to-Observed Data Comparison Tool • Air Quality Summary Reports • Visibility • Deposition • Ozone • Other • Model Data Mapping Tool • Source Apportionment Tool • Various Unpublished Monitoring Data Tools • Backend Web Services and Processing Components
Summary of Coming Activities • Conduct additional use case tests • Finalize the large data transfer system • Import preexisting/legacy air quality studies and results • Commence production-level data warehouse operations (hosting, data analysis and processing, maintenance, et cetera) • Design visualization and analysis tools for modeling results and performance evaluation • Design the “project overview and tracking” interface for the TSDW website
TSDW Use Cases Definition of "Use Case": A list of steps defining the interactions between a user and a system to achieve a specific goal. The "user" can be a human or an external system, depending on context. Scopes of Use Cases:The subset of users to which the functionality of a given use case is made available Internal: The TSDW administration and development team External: A subset of external users that have been granted a specific role Public: The general public - anyone who visits the TSDW website Potential User Roles: Administrators Project Managers Project Team Members Stakeholders Data Providers Planners Public
Use Case Description • Obtain and Manage Model Input Data (Scope: Internal) • Obtain model input data from data provider(s) • Copy model input data files to file server • Organize model input data on the file server • File and folder naming convention • Physical file system organization (what developers see) • Logical file system organization (what the user sees) • Dataset partitioning (temporal, spatial, functional, etc.) • Perform periodic backup of "active" model input data • Perform periodic archival of "inactive" model input data • Track and manage the versioning of the model input data
Use Case Description • Harvest File Metadata Using the XML Metadata Generator (Scope: Internal) • An administrator locates the desired root folder in the file system • An administrator executes the XML Generator program to produce XML files containing file metadata • (Ideally, the above two tasks could be automatically run as a "cron" task on a regular, periodic basis, rather than as a two-step manual process.) • The File Indexing Utility (FIU) processes the newly-generated files to extract the relevant file metadata • The FIU updates the RDBMS with the file metadata • The new file metadata is automatically reflected in the TSDW File Explorer Tool • Dependencies: • The XML File Metadata Generator program • The File Indexing Utility (FIU) • The appropriate RDBMS schema, SQL scripts, and software libraries for managing source file metadata
Use Case Description • Download Model Input Data from TSDW, Online Method (Scope: External) • User logs into the TSDW website • User fills out the Dataset Request form • The user is redirected to the Dataset Request confirmation message/page • The DR form is passed to the Dataset Packaging System (DPS) • The DPS registers metadata about the request into the RDBMS • The DPS locates the physical files that are needed to fulfill the order • The DPS assembles, organizes, and compresses the component files into a downloadable "package" • The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle • The DPS registers metadata about the package (including the "PackageID") into the RDBMS • The DPS notifies the requesting user of the package's availability • The user logs back into the TSDW website (if necessary) • The user initiates a session of the Dataset Transfer System (DTS) to download the files • The DTS registers metadata about the package "receipt" into the RDBMS • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • Dataset Request Form • Dataset Request confirmation message/page • Dataset Packaging System (DPS) (could be one-and-the-same with iRODS or Globus) • Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata • Appropriate RDBMS schema for associating Dataset Requests with Users and Projects • A high volume data transfer program such as iRODS or Globus Connect Server
Use Case Description • "Download" Model Input Data from TSDW, Offline Method (Scope: External) • User logs into the TSDW website • User fills out the Dataset Request form • The DR form is passed to the Dataset Packaging System (DPS) • The DPS registers metadata about the request into the RDBMS • The DPS locates the physical files that are needed to fulfill the order • The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle • The DPS registers metadata about the package (including the "PackageID") into the RDBMS • The DPS notifies the requesting user of the order receipt and future hard drive shipment • The DPS sends a list of the files that comprise the order to a TSDW administrator • A TSDW administrator copies the selected files onto a hard disk drive (HDD) or drives • A TSDW administrator mails the drive(s) to the requesting user • A TSDW administrator records the shipment in the RDBMS • Dependencies: • Dataset Request Form • Dataset Request confirmation message/page • Dataset Packaging System (DPS) • Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata • Appropriate RDBMS schema for associating Dataset Requests with Users and Projects • A manual process for copying data files onto hard disks and mailing them to users
Use Case Description • Download Boundary Conditions Generator (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Utilities section of the website • User fills out the Boundary Conditions Generator (BCG) download form • The BCG download form is passed to the Utility Tracking System (UTS) • The UTS extracts information from the metadata file associated with the current BCG • The UTS associates this metadata with the appropriate User record in the RDBMS • The UTS redirects the user to a download link for the BCG • The user downloads the BCG and any associated instructions and configuration files • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • Boundary Conditions Generator (BCG) program • BCG user guide • BCG download form • BCG download confirmation message/page and installation file link • The appropriate RDBMS schema, SQL scripts, and software libraries for managing BCG download metadata
Use Case Description • Download the CAMx Post-Processing Utility (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Utilities section of the website • User fills out the CAMx Post-Processing Utility (CPPU) download form • The CPPU download form is passed to the Utility Tracking System (UTS) • The UTS extracts information from the metadata file associated with the current CPPU • The UTS associates this metadata with the appropriate User record in the RDBMS • The UTS redirects the user to a download link for the CPPU • The user downloads the CPPU and any associated instructions and configuration files • The DIS notifies the appropriate TSDW administrator(s) of the download • Dependencies: • CAMx Post-Processing Utility (CPPU) program • CPPU user guide • CPPU download form • CPPU download confirmation message/page and installation file link • The appropriate RDBMS schema, SQL scripts, and software libraries for managing CPPU download metadata
Use Case Description • Upload Model Results (Scope: External) • User logs into the TSDW website • User navigates to the Modeling Results Upload section of the website • User fills out the Modeling Results Upload form • User provides a standard description of the model results • User provides the "Package ID" of the model input data used • User provides the Background Conditions Generator "Version ID", if relevant • User provides the CAMx Post-Processing Utility "Version ID", if relevant • User selects the files to upload • User clicks the "Submit" button on the form • The Model Results Upload form is passed to the Data Import System (DIS) • The data files are uploaded and cataloged by the DIS • The DIS creates a unique "DatasetID" that will be linked to this upload throughout its lifecycle • The DIS registers metadata about the upload (including the "DatasetID") into the RDBMS • The DIS notifies the uploading user of the upload success or failure (generally, its "status") • The DIS places the file(s) into the appropriate location(s) on the TSDW file system • The DIS notifies the appropriate TSDW administrator(s) of the upload • Dependencies: • Modeling Results Upload (MRU) form • MRU system • Appropriate RDBMS schema and SQL scripts/commands for managing MRU metadata
Use Case Description • Import Database-Ready Model Results (Scope: Internal) • An administrator locates the newly-imported model results (which have been generated by the CPPU and uploaded to the TSDW) • And administrator executes the appropriate scripts/commands using the Data Import System (DIS) • The DIS reads and imports the database-ready model results into the RDBMS • The DIS verifies that all the necessary metadata is present in the RDBMS • The DIS transforms the data into the appropriate schema for import • The DIS maps source codes and names to internal codes and names, as needed • The DIS imports the data from the source file(s) into the RDBMS • The DIS makes/updates the appropriate metadata records in the RDBMS for tracking the imported model Dataset • The imported model results become automatically available via the relevant tools on the TSDW website • Dependencies: • The CAMx Post-Processing Utility (CPPU) for generating the database-ready model results • The Dataset Import System (DIS) • Appropriate RDBMS schema and SQL scripts/commands for managing Model Results metadata
Use Case Description • Visualize and Analyze Monitoring Data (Scope: External) • User logs into the TSDW website • The user chooses an appropriate visualization and/or analysis tool to use • Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as well as display and formatting options • The tool displays monitoring data in various output products, such as: • Data summary tables • Bar charts • Line charts • Pie charts • Contour maps • Dependencies: • An appropriate collection of monitoring data • Specific design specifications for monitoring data output products • An appropriate collection of online visualization tools and technologies
Use Case Description • Visualize and Analyze Model Results (Scope: External) • The user logs into the TSDW website • The user chooses an appropriate visualization and analysis tool to use • Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as well as display and formatting options • The tool displays model performance and evaluation results in various output products, such as: • Normalized mean error and bias • Mean normalized error and bias • Root mean square error • Correlation coefficients • Soccer plots • Box and whisker plots • Bugle plots • Spatial statistical plots • Spatial concentration plots with observation overlays • Dependencies: • An appropriate collection of model results data • Specific design specifications for model results output products • An appropriate collection of online visualization tools and technologies
Use Case Description • View Project Data and Metadata (Scope: External) • A user logs into the TSDW website • The user navigates to the Projects and Studies section of the TSDW website • The user views metadata associated with the projects that he/she has permission to view • Name, purpose, description • Contact information: project manager(s), contractors, etc. • Associated datasets: Model input data downloaded, model results uploaded, etc. • Analysis products: Charts, graphs, summaries, etc. • The user views data associated with the projects that he/she has permission to view • Model input data • Meteorological inputs • Emissions inputs • Initial and Boundary Conditions • Ancillary inputs (land use, land cover, photolysis) • Model configuration metadata • Model results • Gridded results • Observation-paired results • Monitoring data • Dependencies: • Appropriate RDBMS schema and SQL scripts/commands for managing Project metadata • Projects • Users • Downloaded/Uploaded Datasets • Documents • Analysis products • An online user interface for the Projects and Studies section of the TSDW website
Use Case Summary • Obtain and Manage Model Input Data (Scope: Internal) • Harvest File Metadata Using the XML Metadata Generator (Scope: Internal) • Download Model Input Data from TSDW, Online Method (Scope: External) • "Download" Model Input Data from TSDW, Offline Method (Scope: External) • Download Boundary Conditions Generator (Scope: External) • Download the CAMx Post-Processing Utility (Scope: External) • Upload Model Results (Scope: External) • Import Database-Ready Model Results (Scope: Internal) • Visualize and Analyze Monitoring Data (Scope: External) • Visualize and Analyze Model Results (Scope: External) • View Project Data and Metadata (Scope: External)
Review of the 3SDW Overall System Ecosystem and Architecture
First TSDW Modeling Use Case Report and Results
First Use Case - Beta Test Steps • Testers visited the TSDW website and registered with the system to create an account • Testers visited the Data Request web page and entered their requests for the WestJump Base08b dataset • Each request was stored in the database • The system determined whether or not each request could be automatically filled or had to be manually assembled • The system sent emails to the appropriate TSDW team members to notify them of the data requests • TSDW team members assembled the dataset requests (copied the relevant data files onto hard drives) • The datasets (hard drives) were delivered to the beta testers • The system updated the dataset requests to reflect their “filled” status
First Use Case - Beta Test Steps (cont’d) • Using the delivered datasets, testers ran the models and generated results • Testers returned the model output results to the 3SDW • The test results were assessed by TSDW team members • Testing outcomes were summarized for the May 3-State AQ Study Technical Workshop • The TSDW team refines the dataset ordering, download, packaging, and delivery system according to lessons learned • The TSDW team develops the nextUse Case testing scenario(s)
Summer (June – October) 2013 3SAQS Technical Work Review Data Warehouse Activities
Summary of Coming Activities • Implement the collaborative components of the warehouse • Implement the ongoing news and updates section • 3SDW on-line for NEPA air quality analysis projects by end of October • Out-bound data delivery and in-bound data ingestion for NEPA and other air quality studies • Data warehouse operations (hosting, data analysis and processing, maintenance, et cetera) • Plans for storage/access/visualization for modeling results and evaluation tools • Store UT BLM ARMS and other studies’ data in 3SDW after evaluation using protocols
Testing and Refinement Help • All users, collaborators, and partners can help with testing • Please report bugs – don’t endure them • Use the website Feedback form • Send direct email to team members • Provide as much information as possible up-front • Stay abreast of ongoing additions and updates • Be an active part of the design process - make suggestions for features and refinements • Don’t assume it can’t be done • Don’t assume it can be done