300 likes | 427 Views
SIMO Python/XML Simulator Current situation 28/10/2005. SIMO Seminar 28.10.2005 Antti Mäkinen Dept. of Forest Resource Management / University of Helsinki. What can be calculated at the moment?. Development of different variables at stand level. What can be calculated at the moment?.
E N D
SIMO Python/XML SimulatorCurrent situation 28/10/2005 SIMO Seminar 28.10.2005 Antti Mäkinen Dept. of Forest Resource Management / University of Helsinki
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Development of different variables at stand level...
What can be calculated at the moment? • Diameter distributions and tree level attributes
What can be calculated at the moment? Estimating forest variable development at both stand level & tree level is possible at the moment (300+ models implemented), but Forestry operations not yet implemented in the simulator → ”real world” simulations not yet possible Bucking models still not ready Optimizing module still missing
How the simulation process works in SIMO? XML Files Reporter Module IN: XML data OUT: transformed XML, graphs IN: data, simulation control, modelchains, model definitions OUT: results SIMULATOR IN: modelname, input variables OUT: model result, warnings & errors SIMULATION PROCESS MODEL LIBRARY
What is missing? XML Files Reporter Module Validator Module SIMULATOR Optimizer Module MODEL LIBRARY MODEL LIBRARY MODEL LIBRARY MODEL LIBRARY
XML Files Data XML Simulation control XML Model Chain XML Model XML Result XML
Model Library Includes all models used in the simulator Programmed with C language as a Dynamic Link Library (DLL) Models are C functions that are called from the simulator (model definitions also in the Model.xml) Users can add new models to the library or create additional model libraries Reports warnings and errors to the simulator Risk level models not yet implemented
SIMULATOR 1. version of simulator programmed with C/C++ Later the programming language was changed to Python, because of: Simple and concise syntax → easier readability of code and possibility of developing the simulator faster http://www.python.org Good combatibility with C language Number of useful readymade open source tools for variety of purposes Code documentation is underway
SIMULATOR Intakes simulation control instructions, model chains, model definitions and data in XML format Transforms the XML data from different files to simulators own data structure (more efficient than ElementTree data structure) Processes the user defined model chains for each computing unit in the data Calls the model library whenever some value needs to be calculated (Python/C interface ctypes) Prints the resulting values into a result XML file
Reporting Module Used for visualizing data & transforming the results from XML format to other formats Intakes data and processing instructions in XML format At the moment can plot different kinds of graphs of given variables (matplotlib) toolset XML transformations to be implemented later...
Missing modules Optimizer module • Finds the best alternative from the alternatives generated by the simulator • Possibly many alternative optimizing methods? Validator module • Validates the XML files with XSD (Schema) files and by external rules • Makes sure that the XML files are well-formed and contain all necessary elements
Strengths of SIMO XML Simulator Virtually any kind of model can be used in the simulations and added to the model library User can define the model chains freely for different kinds of simulations User can define correction/rectification factors for the models, (eg. different factors for geographical areas) Data levels are not confined to strict predifined standard Extensive warning and error reporting system (risk control coming later...)
Model risk management –individual variables • Minimum and maximum limits of individual variables have been defined • Documented in ModelXML • Limits have been coded into ModelLibrary -> throws warnings if the Individual parameter values are out of bounds • How the minimum and maximum limits are defined? • Limits defined by author (caused by data, model shape, …) • Limits of modeling data • Model is tested with those limits using NFI-data as test data. Does the model function properly if the Individual parameter values are out of bounds? • For example: Basal area growth model (Vuokila & Väliaho) for Scots pine on mineral soils
Model risk management –interaction (20, 32) not accepted ba • Interaction between variables Accepted combinations of varibles (120, 5) not accepted • Solution alternatives: • Logit-model: propability that the estimate is in acceptable area (at least linear regression was not flexible enough) • Grid: area of combinations of variables is divided into cells. Every cell has information is the estimate acceptable or not age
Model risk management • Two levels • Individual parameter values out of bounds • All individual parameter values acceptable, but is the specific combination of them acceptable? • Case 1: already in the simulator • Case 2: Suggestion • get the k nearest neighbours from the VMI data, • evaluate the model for the data point and the k nearest neighbours. • If the difference for the model estimate between the data point and the neighbours is too big, generate an event of ”unacceptable” model estimate
Isn’t that procedure too heavy computationally? • Probably, not yet evaluated • But what about if we store the risk evaluation results and use those primarily: • Is it safe to call ModelA with parameters (5, 6, 10) when we accept risk level X? • Has the risk been evaluated with parameter values (5,6,10) and risk level X before. If yes, get the answer from a table of risk evaluations • If not, get k nearest neighbours for data point (5,6,10), evaluate the model with (5,6,10) and k neighbours • Store the risk evaluation result and the mean model result for k neighbours for the data point (5,6,10) and risk level X
Open questions: • When evaluating model result shall we compare it to: values derived directly from the nearest VMI permanent sample plotsORmodel estimates for the nearest VMI sample plots?
Software license for SIMO • Types of Open Source licenses • MIT & Co: “Do whatever you want” • LGPL: “Everything you do to the original code must be open source, anything on top of that can be closed” • GPL & Co: “Everything you do is open source, …well almost” • GPL under the hood: "derivative work" or "mere aggregation“? Derivative work must be open source, but aggregation can be closed source
The case of MySQLDouble licensing: open source GPL, commercial development with a commercial license that allows closed source
General software architecture • Individual components that communicate over the network • Validator • Simulator – this is well underway • Optimiser • Reporter – simulation results to figures and other data formats than XML, or different XML format etc. • Implications to licensing? What about if one of the components uses a sub component that is published under GPL?
Architecture continued • TCP/IP based communication • Security issues? • secured traffic (SSL, SSH) • inside firewall • Scalable