2.49k likes | 2.68k Views
Minnesota AD Model Builder Short Course October 22-24, 2007. Thanks to Jim Bence, Brian Linton, and Brian Irwin for providing materials used in previous courses QFC Supporting Partners – MSU, GLFC, Michigan DNR, Minnesota DNR, Ohio DNR, New York DEC, Illinois DNR, Ontario MNR.
E N D
Minnesota AD Model Builder Short CourseOctober 22-24, 2007 • Thanks to Jim Bence, Brian Linton, and Brian Irwin for providing materials used in previous courses • QFC Supporting Partners – MSU, GLFC, Michigan DNR, Minnesota DNR, Ohio DNR, New York DEC, Illinois DNR, Ontario MNR
Quantitative Fisheries Center (QFC) • Created July 2005 • Co-directors: Jim Bence and Mike Jones • Staffing: • Associate Director • Computer Programmer • Post-Docs (2) • Graduate students (3 - PhD; 3 - MS)
Quantitative Fisheries Center (QFC) • Provide research, outreach, and educational services to supporting partners • Outreach examples • Computer programming support to Michigan DNR inland creel database • SCAA consultation for Lake Erie percid assessments • River classifications in MI, WI, NY, PA • Power analysis for OhDNR Lake Erie gill net surveys
Quantitative Fisheries Center (QFC) • Education • AD Model Builder short courses taught in East Lansing (2006, 2007) and Cornell Biological Field Station (2007) • Online Maximum Likelihood Estimation course (launched October 16, 2007) • Introduction to R short course (currently being converted to an online format) • Online Resampling Approaches to Data Analysis course (planned for summer 2008)
How this course will differ from previous offerings • More emphasis on straightforward applications • More hands on programming (coding the whole program rather than only bits and pieces) • Less emphasis on coding efficiency (comes with practice)
What is AD Model Builder and why should you use it? • Auto Differentiation Model Builder • Software for creating computer programs to estimate parameters of statistical models
What are the advantages of using it? • Fast and accurate • Flexible • Designed for general maximum likelihood problems • Libraries for Bayesian and robust estimation methods • Includes many advanced programming options (estimation in phases) • Multi-dimensional arrays
How fast is it? • Evaluation by Schnute and Olsen • 100 parameter catch-at-age model from Schnute and Richards (2005)
Why is it so fast? • Auto differentiation – a method for approximating derivatives to within numerical precision • Most other computer programs actually calculate derivatives with respect to every parameter (finite differences) • Newton-Raphson – requires first and second order derivatives • Levenberg-Marquardt – requires first order derivatives
What are some of the most noticeable differences with other software packages? • Users must specify the objective function to be minimized (Note: ADMB only does minimization)
Objective function Parameter value
ADMB Differences with SAS data lenweight; input length weight; datalines; 358 212 360 242 382 402 388 285 394 325 . . . • 12542 • 15909 ; Run;
ADMB Differences with SAS procnlin data=lenweight; parameters a=0 b=3; model weight=a*length**b; run; Proc NLIN estimates parameters by (weighted) least squares; minimize the sum of square errors
ADMB Differences with SAS procnlmixed data=lenweight; ypred=alpha*length**beta; parms alpha=0.001, beta=3, sigma=1; model weight~normal(ypred,sigma); run; Proc NLMIXED estimates parameters by maximum likelihood
ADMB Differences with SAS procnlp data=lenweight tech=newrap inest=par1 outest=opar1 maxiter=1000; parms a, b, sigma; ypred=a*Length**b; nlogl = log(sigma)+0.5*((weight-ypred)/sigma)**2; min nlogl; run; Proc NLP (NonLinear Programming) in SAS/OR is an estimation method similar in vein to that of ADMB in that analysts must specify their objective function
What are the most striking differences with other packages? • Users specify the objective function to be minimized • Steps to running • Create an ADMB template • Convert template to C++ code • Compile – convert from programming code to machine code (creates an executable file) • Link the executable file to C++ libraries • Run your executable file • Resulting executable can be run on similar datasets on any computer
What are the difficulties associated with using ADMB? • Requires a more intimate knowledge of statistical theory (probability distributions, likelihoods, Hessians) • Some knowledge of C++ is required • Code can be a little quirky (as you will soon see)
ADMB Files Input .tpl – make the model .dat – input data .pin – initial values (optional; need to specify for all parameters) Output .par – parameters estimates .cor – correlation of parameters .std – parameter estimates with std. deviations .rep – user-defined outputs (optional)
ADMB Files Input ADMB will expect .dat and .pin files to have same name as .tpl e.g., MilleLacs.tpl, MilleLacs.dat (this can be overridden) Output • By default, output files will have same file name e.g., MilleLacs.rep, MilleLacs.par (this can be overridden) • Note: In the project folder, • ignore the files with the extra ~ on the extension… • e.g., Oneida.tpl~ • they are temporary files (so be sure you open the right file).
.dat file • Simply contains the data you will use when fitting your model Simple.dat #Simple linear regression example #For ADMB Short Course 1, August 2007 #Created by D. Fournier, modified by B. Linton #Any text after "#" is ignored # number of observations 10 # observed Y values 1.4 4.7 5.1 8.3 9.0 14.5 14.0 13.4 19.2 18 # observed x values -1 0 1 2 3 4 5 6 7 8
Each must be written just like that .tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION • REPORT_SECTION • Other commonly used section • PRELIMINARY_CALCS_SECTION • LOCAL_CALCS
Keep in mind • Different sections use different programming languages • Data, Parameter, Initialization sections used ADMB code • Procedure, Report, Local Calcs, Preliminary Calcs sections use C++ code • Lines typically must end with ; • Not absolute as in SAS (loops, conditional statements)
Keep in mind • Comments in .dat file are specified with ‘#” • Comments in .tpl are specified with ‘//’
Keep in mind • Section heads (DATA_SECTION, PARAMETER_SECTION) must be left justified • Except LOCAL_CALCS section, requires one space before typing LOCAL_CALCS • All other lines should have two spaces before the text
.tpl Sections • DATA_SECTION Identify values that will be read-in from .dat file Need to consider the order of numbers in your .dat file Can read your data in as integers, real numbers, matrices, arrays,… DATA_SECTION init_int first_year init_int last_year init_int first_age init_int last_age init_number lambda init_matrix obs_length(first_year,last_year,first_age,last_age)
.tpl Sections • DATA_SECTION Also where you can declare your looping variable; valid throughout your entire code DATA_SECTION init_int first_year init_int last_year init_int first_age init_int last_age init_number lambda init_matrix obs_length(first_year,last_year,first_age,last_age) int i int j
.tpl Sections If .dat doesn’t have the same name as .tpl • DATA_SECTION • Assume program is MyModel.tpl • Then, default search is for MyModel.dat • Code below will read-in a file named ControlFile.dat: • !!ad_comm::change_datafile_name("ControlFile.dat"); • Can also go back: • !!ad_comm::change_datafile_name(“MyModel.dat"); !! – tells ADMB that what follows is C++ code
.tpl Sections Always a good idea to verify that your data have been read in correctlyIn .dat file, have -8888 as your last entryIn Data_section, specify init_int test as the last read in variable and type!!cout << test << endl;!!exit(99); • DATA_SECTION
.tpl Sections DATA_SECTION//Read data in from simple.dat init_int nobs //number of observations init_vector Y(1,nobs) //observed Y values init_vector x(1,nobs) //observed x values init_int test //test variable !!cout << test << endl; !!exit(99); • DATA_SECTION
.tpl Sections • DATA_SECTION • PARAMETER_SECTION • Define Parameters – the values to be estimated (must have at least 1) • use loge scale, if only interested in non-negative parameter space • Identified by the prefix init_ • Intermediary Variables - quantities that will change as a result of parameter estimation • Can also declare index variables here. • Also, if “containers” are needed just for output and not for calculations, then put those here too. • Name your Objective Function – the quantity to be minimized
.tpl Sections • DATA_SECTION • PARAMETER_SECTION PARAMETER_SECTION //Parameters to be estimated init_number a //slope parameter init_number b //intercept parameter //Quantities calculated from parameters vector pred_Y(1,nobs) //predicted Y values //Value to be minimized by ADMB objective_function_value rss //residual sum of squares
Keep in mind • Init_ in DATA_SECTION indicates a value that will be read in from the .dat file • Init_ in PARAMETER_SECTION specifies a variable that will be estimated
.tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION Set Initial values for parameters - use in place of .pin file log_F -1.0 log_M -1.6
.tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION Back transform parameters for use in functions (if needed) e.g., F = exp(log_F) Construct Functions Specify the equation for your Objective function Must have a PROCEDURE_SECTION for model to compile
.tpl Sections • PROCEDURE_SECTION DATA_SECTION init_int nobs //number of observations init_vector Y(1,nobs) //observed Y values init_vector x(1,nobs) //observed x values PARAMETER_SECTION init_number a //slope parameter init_number b //intercept parameter vector pred_Y(1,nobs) //predicted Y values objective_function_value rss //residual sum of squares PROCEDURE_SECTION //Simple linear model gives predicted Y values pred_Y=a*x+b; //Parameter estimates obtained by minimizing //objective function value (residual sum of squares) rss=norm2(Y-pred_Y); //norm2(x)=x1^2+x2^2+...+xn^2
.tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION • REPORT_SECTION Specify output to go to .rep file Be sure to end .tpl with an empty line (hard return)
.tpl Sections • Report section useful for reporting values not otherwise needed in the model • Can be organized in many ways • Can still do calculations in REPORT_SECTION • e.g., report<< “S: ” << exp(-Z) <<endl; • Results (.rep file) can be read into other programs
Create an Output .dat file Append to file • Use an output file stream ofstream ofs(“MyOutput.dat”,ios::app); { ofs << “Output variable x: “ << x << endl; ofs << “Output variable y: “ << y << endl; } • Also can delete a file system(“del MyOutput.dat); Note: different system command for Linux
Other .tpl Sections • PRELIMINARY_CALCS_SECTION Uses C++ code Can do some preliminary calculations and manipulations with the data before getting into the model proper e.g., pi = 3.14; • RUNTIME_SECTION • Change behavior of function minimizer • TOP_OF_MAIN_SECTION • Change AUTODIFF global variables
Compare with SAS code DATA GROWTH; INPUT AGE LENGTH GENDER; DATALINES; PROCNLIN DATA=GROWTH METHOD=MARQUARDT; PARAMETERS LINF1 = 1100 K1=0.4 T01=0.0; YPRED= LINF1*(1-EXP(-K1*(AGE-T01))); MODEL LENGTH = YPRED; OUTPUT OUT=DATA_OUT PRED=PP RES=RR; RUN; Data Section Runtime Section Initialization Section Prelim Calcs Section Procedure Section Report Section
.tpl File • General rule: make .tpl file as general as possible (try to avoid hard coding) – will allow you to analyze future datasets • Must be “compiled” into C++ code 1) tpl2cpp (makes .cpp file) 2) compile (makes .exe file) 3) link (connects libraries) • We’ll use Emacs (more later)
Compiling your .tpl • Need a C++ compiler to run your code • After it is compiled, model will be a .exe • (so can be run on machines without ADMB) • If you change the .tpl file, it must be recompiled… • If you change and save data (values, sometimes dimensions), the existing model will still be ready to go… • So, advantage to putting starting values, ect…, into .dat or .pin files.
How should I build my tpl Suggestions • Keep projects in separate folder • Name, describe, and date each file at the top • Start with a simple working program • Be sure data get read in correctly • Use unique names for files and parameters (don’t use “catch” as a variable name) • Avoid “hard coding” … make it flexible • Build it one step at a time • COMMENT, COMMENT, COMMENT
About Emacs • For this class, you will Emacs to construct your .tpl file • A highly customizable text editor • We have modified Emacs so that an ADMB .tpl file is automatically linked to a C++ compiler • MINGW32 is a freeware C++ compiler – don’t need to buy both ADMB and Visual Studio
Using Emacs • Refer to Emacs Basics handout • Hotkeys are different • e.g., “control-v” will not paste • Highlighting text will automatically copy it • Remember to save files and recompile .tpl
Let’s Try an Example Simple linear regression model Estimation by least squares
Let’s Try an Example • Start Emacs by double clicking the Emacs icon on the desktop
Let’s Try an Example • Open the simple.tpl and simple.dat files in the MNADMB folder located on your desktop