190 likes | 205 Views
This project aims to improve MLPowSim, a tool for sample size calculations in multilevel models, by streamlining code, enhancing input validation, and adding automated testing. By developing a cohesive framework and multiple user interfaces, the project seeks to enhance user experience and code maintenance, while investigating parallelization for speed improvements.
E N D
a starting point for: “Using simulation in parallel computing for faster sample size calculations in complex random effects models” Toni Price, University of Bristol
MLPowSim • Developed in a separate ESRC-funded project • Generates both MLwiN macro code and R language code for performing sample size calculations on multilevel models • Works for a selection of multilevel nested and crossed designs • Text-based interface • Uses C code to gather user input and generate output
Initial objective: Use MLPowSim as a basis and extend to support a broader range of models • Good starting point, but would benefit from an automated way of testing that generated code matches expected output (especially as new and more complex models are added)
First step Put into a cohesive framework: • Streamline duplicated code (e.g. for user input which is similar across different models) • Also improves code maintenance (e.g. bug fixes impacting fewer lines of code) • Improve input validation • Makes for a better user experience and reduces crashes • Automate testing of generated code and results • Add multiple user interfaces, e.g. command line / file input / web-based
Ruby is … • Much like Python in a number of ways • Cross-platform • A good choice for metaprogramming • Excellent for text processing … though in the end boils down to personal preference
… moving to Ruby In the words of the official Ruby site (http://www.ruby-lang.org/en/) Ruby is “A dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write.” (… I agree!)
Input methods • Command line • Current input method • File input • Useful during development • Facilitates automated testing • Web interface • Familiar mode of input • ‘Easy’ to use
# Input params # # Example 1 (p. 8 in MLPowSim user manual) # MLwiN code output general: output_lang: mlwin rnd_num_seed: 1 sig_level: 0.025 n_sims: 1000 model: n_levels: 1 response_type: normal est_method: igls include_fixed_intercept: yes n_explanatory_vars: 0 estimates: beta_0: -0.140 sigma_sq_e: 1.051 sample_size: level_1: low: 20 hi: 600 step: 20 File input – Example for a 1-level model
# Input params # # Example 8 (p. 39 in MLPowSim user manual) # MLwiN code output general: output_lang: mlwin rnd_num_seed: 1 sig_level: 0.025 n_sims: 1000 model: n_levels: 2 is_balanced: yes structure: nested #=> nested | cross-classified response_type: normal est_method: igls include_fixed_intercept: yes include_random_intercept: yes n_explanatory_vars: 0 estimates: beta_0: -0.177 sigma_sq_u: 0.151 sigma_sq_e: 0.916 sample_size: level_2: low: 10 hi: 50 step: 10 level_1: low: 10 hi: 60 step: 10 File input – Example for a 2-level model
Advantages of adding a Web interface • More accessible • No download required • Indexed by search engines • Cross-platform (Windows/Mac/Linux) • Up-to-date version available as soon as deployed • Centralised bug fixes • New features • No distribution overhead • Opportunity to collect usage information • E.g. model parameters … aligned with e-Stat objectives
Disadvantages of Web interface • “Constrained” by browser functionality • Need to be online to use it • Needs hosting resources … fine for code-generation app as it stands, but would be too resource-intensive to run simulations and model-fitting on server
[Demo of command-line and Web-based interfaces for MLPowSim]
Improving speed • Another, parallel (so to speak ☺) objective is using parallelization to speed up run-time for generated power calculation code • Have taken an initial look at using capabilities of multi-core processors by executing more than one run simultaneously • Exploratory code makes use of Unix (Linux) ‘forking’ to create sub-processes • This approach will not work on Windows (since Windows does not support forks) • Precludes possibility of using this approach for MLwiN
Improving speed … contd. • For now, doing tests on R code in Linux Initial results (very rough, just a starting point): • Model: 1-Level, Normal response, Fixed intercept, No explanatory variables • R code with sample sizes from 400 to 600 in steps of 100 (i.e. 400, 500, 600)
Improving speed … contd. Summary
Where to from here? … this is just a small start … • Extend MLPowSim to support more models • Add test cases for code generation to cope with more models • Add automated tests for verifying actual numerical output • Further develop Web interface • Continue investigating speed improvements through parallelization