260 likes | 393 Views
Open sharing and maintenance of scientific code. Jordan S Read; Luke A Winslow 2013-08-20. Background. Who I am USGS-CIDA 2012 PhD in physical limnology (UW-Madison) Civil Engineer. My experience with code and model development Lake Analyzer CLM rGDP ; rGLM Numerous collaborations.
E N D
Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow 2013-08-20
Background • Who I am • USGS-CIDA • 2012 PhD in physical limnology (UW-Madison) • Civil Engineer • My experience with code and model development • Lake Analyzer • CLM • rGDP; rGLM • Numerous collaborations
Background My philosophy on science code: “Code created for the pursuit of science questions should be open, accessible, and designed to enable others to build from” • Kind of like your scientific publications, right? • That means I shouldn’t be able to build my scientific livelihood around a piece of “black-box” code
Background • My responsibility as a member of the science community: “Methods used to obtain published results should be clear, transparent and repeatable” • My responsibility as a federal employee: “Provide public access to all elements of publicly funded research”
Road map Part I Part II Maintaining and modifying code Code collaboration • My experiences with science code development • Motivation to open up your scientific code
Lake Analyzer • GLEON background • Hanson & Hamilton collaboration and student exchange • Physics & Climate working group • Requirements • Easy to use • Provide access to complex physical derivatives • Handle dataset irregularities • Errors, gaps, intermittent sampling frequencies, etc. • Rapid processing of large datasets
Lake Analyzer • I took on the role of primary coder • Why? GLEON had paid my travel to two meetings…including NZ! • I did the work in MATLAB, because that is what I was most familiar with • Side project during grad school • Built from feedback from GLEON physics & climate group
Lake Analyzer • Repeatable • .lke file ~ metadata • Visualizations (plotting options for outputs) • Easy to use
Lake Analyzer • Software publication
Lake Analyzer • Software publication • Open codebase
Lake Analyzer • Software publication • Open codebase • Platform/language independence
Lake Analyzer • Software publication • Open codebase • Platform/language independence • Useful and citable 19 citations in ~20 months
Opening up scientific code • Publishing your code • Would a simple paper of physical derivations be cited at this rate? • Would a methods paper be as popular if the code wasn’t available/open? • Additional motivation for creation of code • Writing open code • More use • Ease of collaboration • Integrity/transparency
Opening up scientific code • Reasons many choose not to open code • Too much work • Code is too messy • Potential for criticism • Code as scientific livelihood • Has known errors… • Others?
Opening up scientific code • When to put in the effort • Collaborations • When you are doing it “right” • When you will use it in the future • When you are publishing something • When you have to • Others?
Part II: Maintaining code So…the code works, what’s next? • How do I take risks with code? • i.e., changing the way a function works • What if I make a mistake? (undo+undo+undo…?) • How do multiple people collaborate on a single set of scripts? • In serial? • Google docs vs word for writing a paper
Maintaining code • Risky modifications • Metabolism_modelv28.R? • Metabolism_model_NEW.R? • Metabolism_model_NEWsecondTRY.R? • Metabolism_model_NEWEST.R?
Maintaining code • When we publish, we use track changes • Can we do the same for code? • Version management • AKA: version control, revision control, source control • How it works • Why you should know what it means • Benefits to using version management • Historical record of code evolution • Easy to “roll back” to previous working version • The code has only one home
Maintaining code How it works • Creates a “life history of code”
Maintaining code Hey, nice sweater Sure! I have some modeling code How it works • Creates a “life history of code” Thanks. I travel a lot. Want to start a project? So do I! Let’s combine our efforts
Maintaining code Here is a new set of methods 1 2
Maintaining code I made some improvements 1 2 3
Maintaining code Whoops! Fixed a bug 1 2 3 4
Conclusions • Code as if it will be seen and used by others • You may be that “other” in 3 years • Decide if creating publicly usable code makes sense for your research • Make your code accessible to collaborators • Consider the concepts imbedded in version management
Thanks GLEON FP & TLS! Questions? Jordan S Read USGS Center for Integrated Data Analytics 608-821-3922 | jread@usgs.gov