1 / 39

Secrets of Unidata Software Engineers

Secrets of Unidata Software Engineers. Russ Rew UCAR Software Engineering Assembly April 26, 2006. Unidata in a Nutshell. Mission: To provide data, tools and community leadership for enhanced Earth-system education and research The Unidata Program Center:

lexiss
Download Presentation

Secrets of Unidata Software Engineers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secrets of Unidata Software Engineers • Russ Rew • UCAR Software Engineering Assembly • April 26, 2006

  2. Unidata in a Nutshell • Mission: • To provide data, tools and community leadership for enhanced Earth-system education and research • The Unidata Program Center: • Facilitates (real-time) data access • Provides and supports data access, analysis, and visualization tools and services • Builds and advocates for a community of geoscience educators and researchers • UPC size: 12 developers, 12 other staff

  3. Unidata Developers • Jeff McWhirter • Don Murray (.75) • Jen Oxelson • Russ Rew (.25) • Anne Wilson • Tom Yoksas (.75) • Tom Baltzer • John Caron (.75) • Steve Chiswell • Ethan Davis • Steve Emmerson • Ed Hartnett (.25) • Yuan Ho • Robb Kambic

  4. Overview: The Mystery • Premise: Unidata has been very successful in its software development • Premise: Unidata’s software engineering process appears haphazard and chaotic • Mystery: Why is Unidata’s software successful and popular when it makes little use of recognized development methodologies? • Speculations, theories, and revelations

  5. Some Software Successes • Integrated Data Viewer (IDV) • Local Data Manager (LDM) • netCDF, netCDF Java (nj22) • THREDDS and THREDDS Data Server (TDS) • Units library (udunits)

  6. IDV • Unidata’s newest scientific analysis and visualization tool • Freely available 100% Java framework and reference application • Provides 2- and 3-D displays of geoscience data • Stand-alone or networked application • Integrates data from disparate sources • End-to-end test for Unidata technologies

  7. IDV’s Success • In use at over 80 Unidata sites and use growing rapidly • Selected as the visualization tool for the Operations Center in T-REX • Bill Hibbard, developer of Vis5D and VisAD, calls the IDV “far better than any other environmental visualization system”

  8. LDM • Peer-to-peer system for reliable, event-driven data distribution using LDM-6 software • Supports subscriptions to near real-time data feeds • LDM protocols use persistent TCP connections, suitable for pushing a large number of small products, as well as large products • Highly configurable: can inject, distribute, capture, filter, and process arbitrary data products

  9. LDM’s Success • Unidata’s Internet Data Distribution system: • Near real-time data for 175 universities and research organizations • 30 data feeds (radar, satellite, text bulletins, lightning, model forecasts, surface obs, upper air obs, ...), • Also used by USGS, NASA, ESRL, weather services in Spain and Korea, active projects on 6 continents • Data volume: 2.5 GB/hr, 120000 products/hr; ranks fifth in weekly Internet2 traffic (Iperf, HTTP, NNTP, SSH, LDM, ... FTP)

  10. More LDM Successes • NOAA/NWS adopted for Level II radar distribution • From 134 radars to 125 weather forecast offices, 22 universities, 10 federal organizations, 12 commercial organizations • Will be used in THORPEX Interactive Grand Global Ensemble (TIGGE) • Model output collection from 10 global modeling centers • Collected at 3 archive centers (NCAR, ECMWF, Beijing) • Test from ECMWF to NCAR sustained 17 GB/hr • Candidate to replace WMO’s Global Telecommunications System (GTS)

  11. NetCDF’s Niche • Simple data model for scientific datasets • Portable, self-describing data • Supports direct access (unlike XML) • Many language interfaces: C, Fortran, C++, Java, Python, Perl, Ruby, ... • Lots of applications • Efficient subsetting of multidimensional arrays • Supports appending, sharing, archiving data

  12. NetCDF-Java (nj22) • 100% Java library, more advanced than C-based interfaces • Prototype implementation of Common Data Model for access to netCDF-4, OPeNDAP, HDF5 • Provides netCDF interfaces to other formats: Grids (GRIB1, GRIB2), Radar (NEXRAD, NIDS, DORADE), Satellite (DMSP, GINI), Point Observations (BUFR (soon)) • Provides uniform coordinate systems layer • Access to THREDDS catalogs • Implements access through NcML

  13. Application Application Applications Scientific Datatypes Point Trajectory Station Radial Grid Swath Coordinate Systems Common Data Access Model OPeNDAP HDF5 GRIB netCDF ... THREDDS Common Data Model

  14. Success of • Basis for CF Conventions for climate and forecast data • Used at LLNL/PCMDI for archiving model output for the upcoming IPCC Fourth Assessment Report: 23 models, 30 TBytes, 70000 files • Used in various archives maintained by NOAA, NASA, USGS, DoE, NCAR, BADC, CSIRO, ... • C and Fortran netCDF Users Guides have been translated into Japanese at Kyoto University • Other uses in chromatography, mass spectrometry, neuro-imaging, biomolecule trajectory simulations, ... • Used in 15 commercial packages and over 50 open source packages for analysis, visualization, and data management

  15. THREDDS • Originally funded under NSF Digital Libraries initiative • “Discovery and use of scientific data” • Middleware between data providers and users • Dataset Inventory Catalogs (XML) • Now part of Unidata Data Collections effort • Data Serving (pull) • THREDDS Data Server (TDS) most recent development • A THREDDS catalog provides a hierarchical structure for factoring inherited metadata

  16. TDS (THREDDS Data Server) • Integrates data access with THREDDS catalogs and services • Tomcat/Servlet, 100% Java, single war file • Data input is netCDF Java 2.2 library • Data output: • OPeNDAP (for accessing subsets) • HTTP Server (for bulk file transfer) • OGC Web Coverage Server (currently gridded only, subsetting supported) • Supports dynamic generation of catalogs

  17. Success of THREDDS • THREDDS used in NCAR Community Data Portal, many other data archives • TDS in use for serving IDD data from motherode.ucar.edu, other data providers • From “Lessons Learned: Evaluation Studies Related to Geoscience Data in THREDDS and DLESE”, Susan Lynds et al: • “Data providers agreed that THREDDS has made data access much easier than it used to be and enables them to reach new user communities.”

  18. udunits • Library for manipulating units of physical qualities. • Conversion of unit specifications between formatted and binary forms • Arithmetic manipulation of unit specifications • Conversion of values between compatible scales of measurement • C, Fortran, and Java interfaces • Required by CF conventions

  19. udunits Success • Almost as widely used as netCDF

  20. The Unidata Development Process • Unidata’s software engineering process appears haphazard and chaotic. • No uniform software engineering process • No regular code reviews • Specifications for software often missing or vague • No enforcement of coding standards • No measurement of programmer productivity • No effort underway to improve software engineering methodology

  21. What Accounts for Unidata’s Successes? • ... and can other organizations benefit from the answers? • Magic fairy dust? • Advanced processes? • Signing bonuses? • Working conditions? • Luck?

  22. I’ll Offer Some Theories • The identified factors are subjective • Based on almost twenty years involvement in Unidata • Discussion question: are any of these easily transferrable? • Discussion question: would we have had even better software success with application of disciplined development methodologies?

  23. Involve Developers in Software Support • Superior support for users of legacy applications: • GEMPAK • McIDAS • Support for software developed elsewhere: • OPeNDAP • VisAD • Every developer expected to answer user questions

  24. GEMPAK • Application for analysis and visualization • In use at over 200 sites, use still growing • Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting data types

  25. McIDAS • In use at approximately 100 sites, a growing number outside the U.S. • Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting new data types

  26. Unidata User Support • Over 30 responses to user questions/day • Searchable support archives help • Support for legacy apps still significant • Balance between visualization apps, data middleware • Keeps developers close to users

  27. Leverage User Efforts • NetCDF users have contributed language interfaces, applications, good ideas, and bug reports: www.unidata.ucar.edu/software/netcdf/credits.html Bob Albrecht, Ethan Alpert, Chris Anderson, Ayal Anis, Harald Anlauf, Phil Austin, Eric Bachalo, Jason Bacon, Sandy Ballard, Matthew Banta, Mike Berkley, Sherman Beus, Lorenzo Bigagli, Mark Borges, Nicola Botta, Dr. Kenneth P. Bowman, Bill Boyd, Mark Bradford, Bernward Bretthauer, Dr. Paul A. Bristow, Roy Britten, Glenn Carver, Tom Cavin, Morrell Chance, Susan C. Cherniss, Jason E. Christy, Gerardo Cisneros, Alain Coat, Carlie J. Coats, Jr., Jon Corbet, Alexandru Corlan, Jim Cowie, Arlindo da Silva, Rick Danielson, Alan Dawes, Donald W. Denbo, Charles R. Denham, Arnaud Desitter, Steve Diggs, Michael Dixon, Alastair Doherty, Bob Drach, Patrice Dumas, Frank Dzaak, Brian Eaton, Harry Edmon, Lee Elson, Ata Etemadi, Constantinos Evangelinos, John Evans, Joe Fahle, Gabor Fichtinger, Glenn Flierl, Connor J. Flynn, Anne Fouilloux, Jean-Francois Foccroulle, Mike Folk, David Forrest, David W. Forslund, Ben Foster, Masaki Fukuda, Dave Fulker, James Gallagher, Bear Giles, Tom Glaess, Peter Gleckler, André Gosselin, Gary Granger, Jonathan Gregory, Patrick Guio, Mark Hadfield, Magnus Hagdorn, Paul Hamer, Steve Hankin, Bill Hart, Kate Hedstrom, Charles Hemphill, Olaf Heudecker, Donn Hines, Konrad Hinsen, Leigh Holcombe, Tim Holt, Toshinobu Hondo, Takeshi Horinouchi, Chris Houck, Matt Huddleston, Matt Hughes, Doug Hunt, Alan Imerito, Jouk Jansen, Harry Jenter, Susan Jesuroga, Patrick Jöckel, Tomas Johannesson, Peter Gylling Jørgensen, Narita Kazumi, John Kemp, Jeff Kuehn, V. Lakshmanan, Bruce Langdon, Stephen Leak, Tom LeFebvre, Angel Li, Jianwei Li, Rick Light, Brian Lincoln, Keith Lindsay, Fei Liu, Jeffery W. Long, Dave Lucas, Valerio Luccio, Lifeng Luo, Steve Luzmoor, Lawrence Lyjak, Rich Lysakowski, Sergey Malyshev, Len Makin, Jim Mansbridge, Andreas Manschke, Chris Marquardt, Marinna Martini, William C. Mattison, Craig Mattocks, Mike McCarrick, Bill McKie, Ron Melton, Roy Mendelssohn, Pavel Michna, Barb Mihalas, Henry LeRoy Miller Jr., Philip Miller, Rakesh Mithal, Masahiro Miiyaki, Christine C. Molling, Skip Montanaro, Thomas L. Moore, Stefano Nativi, Gottfried Necker, Peter Neelin, Michael Nolta, Bill Noon, Enda O'Brien, Dave Osburn, Dan Packman, Simon Paech, Gabor Papp, Morten Pedersen, Dr. Louise Perkins, Michael D Perryman, Hartmut Peters, Ron Pfaff, David Pierce, Alexander Pletzer, Philippe Poilbarbe, Dierk Polzin, Jacob Weismann Poulsen, Ken Prada, Dave Raymond, Michael Redetzky, Rene Redler, Mark Reyes, Doug Reynolds, Mike Rilee, Mark Rivers, Randolph Roesler, Mike Romberg, Mathis Rosenhauer, Suzanne T. Rupert, Toshihiro Sakakima, Eric Salathe, Matthew H. Savoie, Marie Schall, Larry A. Schoof, Dan Schmitt, Robert B. Schmunk, Rich Schramm, William J. Schroeder, Uwe Schulzweida, Keith Searight, Guntram Seiss, Remko Scharroo, John Sheldon, Masato Shiotani, Michael Shopsin, Richard P. Signell, Steve Simpson, Joe Sirott, Greg Sjaardema, Dirk Slawinski, Cathy Smith, Neil R. Smith, Peter Paul Smolka, Nancy Soreide, Hudson Souza, Gunter Spranz, Richard Stallman, Bob Swanson, John Tanski, Karl Taylor, Jason Thaxter, Kevin W. Thomas, Philippe Tulkens, Tom Umeda, Joe VanAndel, Paul van Delst, Gerald van der Grijn, Richard van Hees, János Végh, Bernhard Wagner, Thomas Wainwright, Stephen Walker, Chris Webster, Paul Wessel, Carsten Wieczorrek, Gerry Wiener, Ralf Wildenhues, David Wilensky, Hartmut Wilhelms, Gareth Williams, David Wojtowicz, Jeff Wong, Randy Zagar, Charlie Zender, Remik Ziemlinski.

  28. Strive for Discipline-Independence • Demand is greater than supply for useful data-oriented infrastructure for science • Examples: • netCDF • LDM • THREDDS • udunits • Common Data Model • ...

  29. Emphasize Loose Coupling • Data providers and data consumers should be uncoupled • Data storage should be uncoupled from visualization and analysis applications • Data distribution should be independent of type of data • ...

  30. Find Right Level for Abstractions Data Scientific Data Georeferenced Data Meteorological Data Radar Data

  31. Improve Software Quality by Porting • Platform-independence is important • Achieving it seems to improve quality of software in unexpected ways • Aiming for reasonable tradeoffs between portability and performance requires expertise • Solving portability problems for others (e.g. providing portable data, service-oriented architectures) is a growth industry • Java developers may ignore this

  32. Work on Small Projects • Unidata projects and software packages typically require only one or two developers • Much of software engineering is about scaling to large projects with dozens of developers • May be the #1 secret for success

  33. Find and Exploit Tight Feedback Loops • Develop for an active and interested user community • Find specific users with problems important to them that your software can solve • Exploit short iterations for incremental development • Governance: establish and pay attention to an external Users Committee that meets regularly

  34. Use the Software You Develop • “Eat your own dogfood” • The Unidata Integrated Data Viewer uses netCDF Java, THREDDS, NcML, netCDF decoders, VisAD, OPeNDAP, ADDE servers • Provides end-to-end testing • Prioritizes useful enhancements • Leads to early bug identification by developers instead of users • If taken too far, leads to NIH syndrome

  35. Drive Development with Tests • Test-driven development (TDD) and Unit Testing gives developers confidence to • refactor code • try big changes • port to new platforms • Example: netCDF “make check” runs over 150,000 tests

  36. Value People over Process • Important tenet of the “Manifesto for Agile Software Development”, http://agilemanifesto.org/, to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan

  37. Arrange Long Funding Cycles T. T. T. Put up in a place where it's easy to see the cryptic admonishment T. T. T. When you feel how depressingly slowly you climb, it's well to remember that Things Take Time. --Piet Hein

  38. Summary: The “Secrets” • Involve developers in support • Leverage users efforts • Strive for discipline-independent infrastructure • Emphasize loose coupling • Choose the right level for abstractions • Improve quality by porting

  39. More “Secrets” • Work on small projects • Find good feedback loops • Use your own software • Drive development with tests • Value people over process • Arrange for long funding cycles

More Related