1 / 16

Analysis Engine: Model Evaluation & Uncertainty - Carolina Environmental Program UNC Chapel Hill

The Analysis Engine is a new tool developed by the Carolina Environmental Program at UNC Chapel Hill to support data analysis through creating plots and tables. It offers analysis configurations for repeated analyses and is part of the Multimedia Integrated Modeling System. The Java application contains three main components: a table application, a plotting engine, and a statistics package. Users can import data files in various formats and utilize functions like sorting, filtering, plotting, and statistics computation. The tool allows for multi-column sorting, filtering rows based on criteria, and creating various plot types. It also provides statistical analysis capabilities and an option for automation through a command line interface. Potential applications include model evaluation, sensitivity and uncertainty analysis, emissions modeling quality assurance, and general data analysis. The Analysis Engine aims to enhance data analysis processes and offer a user-friendly interface for environmental research.

cveasley
Download Presentation

Analysis Engine: Model Evaluation & Uncertainty - Carolina Environmental Program UNC Chapel Hill

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison M. Eyth, Prashant P. Pai Carolina Environmental Program University of North Carolina at Chapel Hill October 19, 2004 Carolina Environmental Program UNC Chapel Hill

  2. Background • Supports data analysis by creating plots and tables • “Analysis Configurations” facilitate repeated analyses • Developed as part of the Multimedia Integrated Modeling System (but can be used standalone) • Java application that runs on Windows, Linux, … • Open source – available from http://sourceforge.net/projects/mimsfw • Three main components: • Table application • Plotting engine • Statistics package Carolina Environmental Program UNC Chapel Hill

  3. Table Application • Provides the top level user interface • File menu accesses import and export functions • Currently supported file formats include: • Comma separated (.csv), Custom and tab delimited, Fixed column width, SMOKE Report, and ARFF • Data files are imported as rows and columns • Each file is shown in its own tab with file name, header, data table, and footer • Toolbar and popup menus provide access to functions (e.g. sort, filter, format, plot, statistics) Carolina Environmental Program UNC Chapel Hill

  4. Table Application GUI Carolina Environmental Program UNC Chapel Hill

  5. Toolbar and Pop-up Menu Functions • Multi-column sort • Show rows with Top N values • Show tows with Bottom N values • Filter rows based on criteria (e.g. NOx > 500) • Show / hide columns • Format columns (e.g. number style, color, width) • Create plots • Compute statistics • Edit analysis configuration • Reset Carolina Environmental Program UNC Chapel Hill

  6. Filter Rows Dialog • Use Filter Rows to limit the rows shown in the table • Any number of criteria can be added • Each criterion has a column, operation, and value • Available operations are <, <=, >, >=, not =, starts with, contains, ends with, does not start with, does not contain, ... • Select between showing rows matching ALL criteria or ANY Carolina Environmental Program UNC Chapel Hill

  7. Plotting Options Dialog • Choose Plot type from Bar, Box, CDF, Discrete Category, Histogram, Rank Order, XY, Line, Time Series, and Tornado • Select Data Columns to plot • Specify Units and one to three columns to use for labels • Selected data is passed to the plotting engine Carolina Environmental Program UNC Chapel Hill

  8. Plot Properties are Specified using the Analysis Engine GUI Carolina Environmental Program UNC Chapel Hill

  9. Example Discrete Category Plot Note: Plots are created using a custom Java interface to R Carolina Environmental Program UNC Chapel Hill

  10. Statistics Dialog • Provides interface to the statistics package • Specify statistics to compute and data columns to analyze • Additional details are specified on other tabs • Statistics outputs appear as new tabs in the table application • Statistics are computed using Colt and Weka Carolina Environmental Program UNC Chapel Hill

  11. Example of Histogram Statistics Carolina Environmental Program UNC Chapel Hill

  12. Analysis Configuration Dialog • The Analysis Configuration stores all the table settings and plots that you have created during your session • The selected plots can be viewed, edited or deleted • Plots can be given new names by double clicking the name • Some (or all) of the settings can be saved to a configuration file • Configuration files can be loaded in future sessions or for other data files in the current session Carolina Environmental Program UNC Chapel Hill

  13. Automation • An optional command line interface may be used specify: • Data files to load • Analysis configuration file to use • Type of plots to create (e.g. JPG, PDF, PNG) • Output directory for plots and tables • This allows plots and tables to be created in an automated fashion • Standard analysis products may be created for newly available data sets Carolina Environmental Program UNC Chapel Hill

  14. Examples of Potential Applications • Model Evaluation • Sort to find stations at which the error was the largest • Plot modeled and observed values on box plots, etc. • Create scatter plots of one species vs. another • Sensitivity and Uncertainty Analysis • Perform linear regression and show in plots and tables • Compute correlation coefficients • Emissions Modeling Quality Assurance • Find states with top 10 emission values • Stacked bar charts to show total emissions • Compute histograms • General Data Analysis • Analyze data by sorting, filtering, and computing statistics Carolina Environmental Program UNC Chapel Hill

  15. Future Directions • Initial version will be released on SourceForge by 10/31/04 (which is the end date for the current funding for this work) • Many potential enhancements are listed on SourceForge, e.g.: • Create new rows and columns using functions (e.g difference, sum) • Create plots and tables with data from multiple tabs • Will likely be used as part of the new emissions quality assurance tool (http://sourceforge.net/projects/emisview) • Mr. Tommy Cathey will continue to develop the custom Java interface to R at the EPA Scentific Visualization Laboratory in FY05 Carolina Environmental Program UNC Chapel Hill

  16. References • MIMS Sourceforge page (for downloads): http://sourceforge.net/projects/mimsfw • R (for plots): http://www.r-project.org • Colt (for basic statistics): http://www-itg.lbl.gov/~hoschek/colt • Weka (for regression and correlation analysis): http://www.cs.waikato.ac.nz/~ml/weka/ • Carolina Environmental Program (for more information): http://www.cep.unc.edu • Primary Author: eyth@unc.edu Carolina Environmental Program UNC Chapel Hill

More Related