330 likes | 469 Views
Building Community Resources, Infrastructure, and Collaboration through a Water Science Software Institute. Stan Ahalt, Ph.D. Director, RENCI; PI, Water Science Software Institute (WSSI); UNC Professor of Computer Science; Director , Biomedical Informatics Core NC TraCS.
E N D
Building Community Resources, Infrastructure, and Collaboration through aWater Science Software Institute Stan Ahalt, Ph.D. Director, RENCI; PI, Water Science Software Institute (WSSI); UNC Professor of Computer Science; Director, Biomedical Informatics Core NC TraCS 2013 CUAHSI Conference on Hydroinformatics and Modeling July 18, 2013 Award # 1216817
This work is funded by the National Science Foundation Award #1216817 “Conceptualization of a Water Science Software Institute (WSSI).” Stan Ahalt, PI Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill Chapel Hill, NC, USA Barbara Minsker, Co-PI National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign Urbana, IL, USA Larry Band, Co-PI Institute for the Environment University of North Carolina at Chapel Hill Chapel Hill, NC, USA Margaret Palmer, Co-PI National Socio-Environmental Synthesis Center (SESYNC) University of Maryland College Park, MD, USA Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
What we know about scientists Time spent by scientists using scientific software. Hannay, J. E., MacLeod, C., Singer, J., Langtangen, H. P., Pfahl, D., & Wilson, G. (2009). How do scientists develop and use scientific software? 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering (pp. 1–8). Washington DC: IEEE.
What we know about scientists "Software is a secondary player in the world of scientific work, which is dominated by a reputation economy based on substantive scientific publications.” Howison, J., & Herbsleb, J. D. (2011). Scientific software production. Proceedings of the ACM 2011 conference on Computer supported cooperative work - CSCW ’11 (pp. 513–522). New York: ACM Press.
What we know about scientists “On average, scientist spend approximately 30%of their work time developing scientific software.” (Hannay et al., 2009)
Problem: Research Software Sustainability Maintaining Research Code is Problematic • “The major problem here is that of maintenance. If the developers use their own esoteric coding styles or do not document their code, or if only a small number of developers work on one particular aspect of the code and then all eventually leave the team, their contributions become opaque to future • generations of developers.” (Pitt-Francis et al., 2008, p. 3116)
What we know about scientific software development projects (Carver et al., 2007)
What we know about scientific software development projects Staff are usually scientists, not software engineers (Carver et al., 2007)
What we know about scientific software development projects Software is often developed to solve a particular problem for a single or small set of users. Pitt-Francis, J., Bernabeu, M. O., Cooper, J., Garny, A., Momtahan, L., Osborne, J., Pathmanathan, P., et al. (2008). Chaste: using agile programming techniques to develop computational biology software. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, 366(1878), 3111–3136.
Problems Irreproducible Results • “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software” (p. 775). Merali, Z. (2010). ...ERROR ... why scientific programming does not compute. Nature News, 467(7317), 775–777.
NSF Software Vision (SW-Vision) • NSF-wide vision • Ever increasing sophistication of science is necessitating increasingly sophisticated software • Goal is to nurture, accelerate, and sustain software as critical mode of scientific progress http://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf
NSF Software EcosystemSI2: Software Infrastructure for Sustained Innovation • Software Elements • SSE • Small groups Water Science Software Institute http://www.nsf.gov/si2/ • Software Frameworks • SSI • Integration • Larger teams • Software Institutes • S2I2 • Long-term hubs of excellence serving community of communities
WSSI is YOU! Community Governance From day one it is important that it is both the perception and the reality that you all as the water science community govern the WSSI.
The Water Science Software Institute • The Water Science Software Institute (WSSI)is conceived to be a national center to serve the research community in advancing transformational discovery in water science. • Specifically, the institute will support processes to build and enhance working relationships between interdisciplinary scientists and software engineers to co-create interoperable, sustainable and reusable software that will accelerate water research.
Research is “Agile” • Somewhat similar to Agile Software Methodology, much of research is “Agile” with starts, stops, iterations, and interim work products. Theory Forming concepts, developing and arranging propositions Deducing consequences; making predictions Tests Empirical Generalization Hypothesis Inducing generalizations Drawing samples and devising measuring instruments Observation (Wallace 1983)
WSSI Open Community Engagement Process (OCEP) • In the WSSI OCEP, we borrowed the term “Sprint” from Agile Scrum and defined our own “Research Sprints” which combines research production, software production, and software engineering education and best practices. Initial results are very promising. A new integrated process for advancing water science WSSI OCEP Ahalt, S., Band, L., Minsker, B., Palmer, M., Tiemann, M., Idaszak, R., Lenhardt, C., Whitton, M. (2013). Water Science Software Institute: An Open Source Engagement Process. The 2013 International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE13); San Francisco, California; May 18, 2013. http://waters2i2.org/documents/2013/05/water-science-software-institute-an-open-source-engagement-approach.pdf
First Prototype OCEP Research Sprint • Feb 2013- Present • Domain scientists and software engineers working together to address barriers to an example issue set out in the WSSI conceptualization award • What is the effects of urban green infrastructure (GI) on the fate and transport of stormwater and nutrients within watersheds? Street drainage in Seattle is an example of an innovative green infrastructure retrofit
First Prototype OCEP Research Sprint • The WSSI OCEP won’t be the only activity WSSI does, but it will be one of the main activities. • So let’s walk through a real-world WSSI OCEP Research Sprint that we commenced in February 2013 and is proceeding something this:
OCEP Sprint Stakeholder Engagement • An OCEP Sprint begins when the barriers associated with the overarching questions identified by the stakeholder and domain community are broken down into smaller, tractable problems which can be addressed via software development • At the beginning of every OCEP Sprint, we partner with the NSF Socio-Environmental Synthesis Center (SESYNC) to organize, engage and support transdisciplinary socio-environmental teams in problem identification.
OCEP Sprint Spec’ing Meeting • The OCEP process continues via a “Spec’ing” meeting during which domain scientists and software engineers meet to discuss the barriers to addressing an example issue set out in the WSSI conceptualization award • For our initial Sprint, this example issues was/is: • What is the effects of urban green infrastructure (GI) on the fate and transport of stormwater and nutrients within watersheds? • The Spec’ing meeting also established a foundation for sound software engineering and software development best practices, including an explanation of why code versioning infrastructure, collaborative development environment, test driven development, Agile methodology, and Open Source principles are necessary for OCEP to work
Software Goals for the Prototype OCEP Sprint • Barriers were identified and prioritized and it was determined that the following key additions to the RHESSys model and related software would be most important for advancing scientific work on GI with the following software goals: • Adapt the water routing algorithm (called “Create Flow 9” or CF9) to account for non-topographic roof routing • Visualize and manipulate the resulting flow table from CF9 in a web-based, graphical GIS for model parcel-level green infrastructure implementation (e.g., gutter rerouting) • Develop a prototype human preference model being developed under the direction of another PI to assess visual preferences for different GI implementations ; and, • Integrate of all these components
Prototype OCEP SprintPlanning Period • In the time between the Spec’ing meeting and the Hackathon, planning ensued and encompassed a variety of activities, including: • a “virtual bootcamp” to familiarize participants with RHESSys • implementation of project infrastructure necessary for collaborative software development • e.g., code versioning and email listserv • development of use cases for scoping software requirements • conference calls to discuss the Hackathon agenda, logistics, and evaluation
Hackathon: Part of a WSSI OCEP Sprint In general, a hackathon is an event in which people come together to collaborate intensely on software products, particularly the writing of software. Lapp, H., et al. (2007). The 2006 NESCent Phyloinformatics Hackathon: A Field Report. Evolutionary Bioinformatics Online 3, 287-296. Retrieved April 25, 2013 from, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684128/; http://en.wikipedia.org/wiki/Hackathon “An intense event at which a group of software developers with different backgrounds and skills collaborate hands-on and face-to-face to develop working code that is of utility to the community as a whole.” GMOD (2010). The GMOD EvoHack Organizing Committee, “GMOD Evo Hackathon Open Call,” 2010. www.gmod.org/wiki/GMOD_Evo_Hackathon_Open_Call
Prototype OCEP Hackathon • The five-day WSSI Hackathon addressing the Spec’ing goals occurred on April 15-19, two months after the Spec’ing meeting. • Iterations to refine the Hackathon output occurred April – July 2013 • Finally, the plan is to publish the work in HydroShare when available • Publish, discover, V&V, reproducible science
How did we do? • Let’s see how well the WSSI OCEP fared across these goals of the WSSI and NSF: • Overcoming software barriers to enabling new water science • Diffusing software development and software engineering education and best practices into the research process • Elevating the status, culture, and knowledge of software among domain scientists
Prototype OCEP Sprint Results • Overcoming software barriers to enabling new water science • CF9 refactoring (now CF10) to account for non-topographic drainage of rooftops was accomplished • Performance of code went from n2 to n·log(n) overall and O(n) in places (This is significant!) • A web-GIS for visualizing flow tables was developed, and significant progress was made on routines for manual rerouting of stormwater flow • The ability to add and visualize trees at the parcel level is now part of the human preference model
Prototype OCEP Sprint Results (con’t) • Diffusing software development and software engineering education and best practices into the research process • The establishment of a git code repository and continuous integration site allows both software engineers and domain scientists to coordinate their programming activities as code is hardened and features are added to RHESSys • However, a shortcoming of the Sprint was the failure to develop tests for test-driven development in advance of the Hackathon, as mandated during the planning period • This was/is rectified in the subsequent OCEP Sprint “Iteration” period
Prototype OCEP Sprint Results (con’t) • Elevating the status, culture, and knowledge of software among domain scientists • Many of the participating domain scientists learned new computational techniques (e.g., using Google street view API, basic python scripting) • However, whether all participants gained additional understanding of certain software development processes (e.g., test-driven development, continuous integration) is debatable • These processes occurred primarily in the CF9 and web-GIS groups where the participating domain scientists already had substantial knowledge of development practices
More of what the WSSI will do In addition to the Open Community Engagement Process emphasized herein, the WSSI will provide education, outreach, workforce development, infrastructure, implementation, rating, boot camps, repositories, QA / QC, synthesizing, evaluating, facilitating, planning, and more.
How Open Source Principles Amplify Properly Coordinated Inputs A finite WSSI Institute staff with properly coordinated inputs into the community can amplify its actions by moving together and coordinating actions. “Most importantly, and this is a key idea: lots of little ideas, if permitted to freely combine, can themselves be understood to be a really great idea.” WHAT OPEN SOURCE ACCOMPLISHES FOR SOFTWARE, OCEP ACCOMPLISHES FOR RESEARCH. Michael Tiemann, Amplifying creativity and business performance with open source, Feb 16, 2010, http://opensource.com/business/10/2/amplifying-creativity-and-business-performance-open-source Michael Tiemann, VP Open Source Affairs at RedHat and WSSI Advisory Board Member
WSSI Community Governance = You! From day one it is important that it is both the perception and the reality that you all as the water science community govern the WSSI. So tell us how the WSSI can best serve you– we’re very interested in hearing from you!
Thank You. http://WaterS2I2.org