260 likes | 369 Views
Issues in managing HEP Software Development in a distributed environment. Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy. Contents. Characterizing the problem Key issues and solutions from CDF/D0 Collider Run II Some thoughts on the development process Conclusions.
E N D
Issues in managing HEP Software Development in a distributed environment Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy E. Buckley-Geer, CHEP 2000
Contents • Characterizing the problem • Key issues and solutions from CDF/D0 Collider Run II • Some thoughts on the development process • Conclusions E. Buckley-Geer, CHEP 2000
Characterizing the problem • Developer community of about 150 people (both collaborations) from North and South America, Europe, Asia, India, Russia • Widely varying quality of network connections between FNAL and remote locations • Widely varying abilities of groups to afford to purchase commercial tools E. Buckley-Geer, CHEP 2000
Characterizing the problem • One common denominator since mid-1997: • Everyone can buy a cheap PC and run Linux on it • No more $10-20K workstations. Every member of the group can have a PC • They don’t want to rely on connecting to a central machine at FNAL to do code development • They want to make use of these PCs at their own location to do their code development • First release of CDF code for Linux was January 1998 – several years after the basic development environment was designed E. Buckley-Geer, CHEP 2000
The situation during Run I(CDF - but similar for D0) • Highly centralized code development. • Could only realistically develop code on central machine at FNAL (VMS cluster) – no distributed development was supported even on other VMS systems • Code was ported to run on IRIX and AIX but only frozen releases were available on these platforms • Frozen release were distributed to remote sites as tar files or VMS save sets • Development version of the code was available to desktop VMS nodes at FNAL from 1993 onwards but code could not be committed to repository from these machines E. Buckley-Geer, CHEP 2000
Run I development tools • Code was mostly Fortran with some small amounts of C. About 50 packages. • Used proprietary VMS tools for for version control and package building (CMS and MMS) • Used vendor compilers and debuggers . Only UNIX vendors who supported VMS extensions were considered. Luckily the list was sufficiently long! • No serious use of design tools – some early attempts at D0 but didn’t survive • No tools to locate memory leaks due to the nature of the memory management packages in use – YBOS and ZEBRA E. Buckley-Geer, CHEP 2000
Goals for Run II development environment – early 1996 • Obviously needed to migrate from VMS as a primary platform • Provide ability to do remote development – recognized as important even before the Linux revolution • Reduce the need for proprietary tools for base system • Handle movefrom Fortran to C++ • Identify useful software engineering tools E. Buckley-Geer, CHEP 2000
Configuration Management Joint Project • Formed joint D0, CDF, FNAL Computing Division working group to study configuration management in early 1996 (see E248 for more on Run II joint projects) • Charge was to find and implement a common solution for CDF and D0 for software management • Version control • Package and release organization • Building packages • Distribution • Validation E. Buckley-Geer, CHEP 2000
Configuration Management Joint Project • Group looked at existing tools in use in HEP and elsewhere • Chose • CVS for version control with customizations from Sloan Digital Sky Survey (SDSS) • SoftRelTools from BaBar for package organization and building • UPS/UPD from FNAL for product setup and distribution tools E. Buckley-Geer, CHEP 2000
CVS • Run in client/server mode – adopted from SDSS • Repository on server + cvsuser pseudo account running a restricted shell CVSH that only allows cvs commands to be executed • Local and remote access are identical so users do not need to be on a FNAL computer to access repository – necessary condition for remote development E. Buckley-Geer, CHEP 2000
SoftRelTools (SRT) • Adapted from BaBar experiment • Uses cpp used to create dependencies and gmake used to build libraries & binaries • BaBar and FNAL agreed to diverge on development • It was becoming difficult to add new features given the original structure of the package • Have since done a re-write (Spring 1999) of the package at FNAL to make it more maintainable E. Buckley-Geer, CHEP 2000
UPS – Unix Product Setup • FNAL product in use since 1991 • Supports existence of multiple versions of a product. Choice is made using a ‘setup’ command. • Re-write for Run II • Completed in summer 1998 • In use by both CDF and D0 E. Buckley-Geer, CHEP 2000
Use of these tools at CDF • ~ 65 code developers • 1.3 million lines of code • 71% C++ , 20% Fortran, 8% C, 0.6% Java + external packages • 144 packages • Development release built every night on IRIX, TRU64, SUN, Linux • Daily build logs scanned for errors and reported to developers. Build logs are posted on web • Development builds lead to timely detection and fixing of bugs • Create frozen releases about every 2 months. Also create releases to capture code used for certain milestones. E. Buckley-Geer, CHEP 2000
Use of these tools at CDF • Success of development rebuild varies. Somewhat correlated with number of files changed E. Buckley-Geer, CHEP 2000
Use of these tools at D0 • ~60 code developers have write access to repository • Essentially 100% C++ except for external packages • 280 packages – but big variation in size • Test release of entire package weekly on IRIX and Linux. Goal is to have operational reconstruction exe at the end of every release. Currently 80% success rate. • Production releases occur at intervals determined by the management. Used to capture important milestones and provide stable working versions. • 5 production releases to date E. Buckley-Geer, CHEP 2000
Code Distribution • CDF has a set of custom scripts to distribute code to remote sites. • Both frozen releases and development are distributed • Fairly straightforward to get distribution. • Currently fairly manpower intensive for development release on remote nodes – ½ FTE devoted for fixing problems • Working on switching to UPD for ease of maintenance • No significant automatic code distribution happening in D0 yet E. Buckley-Geer, CHEP 2000
Code Distribution • Majority of distribution is to Linux machines E. Buckley-Geer, CHEP 2000
Compilers • We wanted to write code that adhered to the C++ ANSI standard – not get into the Fortran extensions quagmire! • GCC and vendor compilers were not thought sufficiently compliant in summer 1997 • Chose KAI compiler from Kuck and Associates • Compiler was available on the relevant platforms – including LINUX • Has led to issues with availability of KAI versions of external products that must be built with the CDF/D0 software – e.g. we paid for a port of Open Inventor • We still believe it was the right choice at the time but expect to use EGCS and vendor compilers in the future E. Buckley-Geer, CHEP 2000
Debuggers and other tools • Quality of the debugging tools has left a lot to be desired • This was one of the few downsides of choosing KAI. Things have been particularly problematic on Linux • Have purchased TotalView which is in use on IRIX and will shortly be available for Linux – seems to improve the situation • Case tools – used GDPro and Rational Rose • Mostly used to document design – did not use automatic code generation features • Purify and Insure++ used to look for memory leaks – but not currently available for Linux E. Buckley-Geer, CHEP 2000
Licensed products • Has been very beneficial to negotiate license agreements that cover use of a product by all Run II developers independent of their location • Have done this with KAI, Open Inventor • Get better price - all licenses must be ordered through Fermilab E. Buckley-Geer, CHEP 2000
Thoughts on the development process • Borrowing from the terminology and observations presented in “The Cathedral and the Bazaar” by Eric Raymond – O’Reilly Books • Our code is clearly Open Source because (by and large) it is freely available to anyone who wants to use it from another experiment • However, both CDF and D0 software projects are run using the traditional “cathedral” style of software development • This is necessitated by the requirements to provide schedules, obtain manpower resources from a limited pool, meet milestones and convince review committees that you know what you are doing • We can make some comparisons between aspects of the Open Source (aka Linux) model and what we are doing in HEP E. Buckley-Geer, CHEP 2000
Thoughts on the development process • “Treat your users as co-developers” • Two user communities in an experiment • Those working on the software project – programmers and physicists • The rest of the experiment – the physicist-user • The first group tends to be like the Linux community – working on the project because they are interested in the problem and want to improve the product • The second group just want to use the software to get physics results – they want to improve their physics analysis software but not the infrastructure E. Buckley-Geer, CHEP 2000
Thoughts on the development process • “Release early, release often” • CDF has shown that this leads to more timely bug fixes and shorter integration time and is very desirable for the project developers • However, it drives the physicist-user to distraction because he/she just wants something that works! • Have to have stable frozen releases in addition E. Buckley-Geer, CHEP 2000
Thoughts on the development process • Some of the skills necessary to co-ordinate a successful Open Source project are relevant to managing an HEP computing project • Must have good people and communication skills • Need to be able to attract people to the project and keep them interested and happy • These can often be more important than possessing great technical prowess • If often feels like we are in a bazaar rather than a cathedral! E. Buckley-Geer, CHEP 2000
Conclusions • CDF and D0 are successfully managing their software development projects with ~ 60 – 70 developers per experiment and 1 million lines of C++ each • We are expected to have schedules, milestones and reviews which makes it unlikely that we can ever manage a project using the bazaar model • However, some of the Open Source concepts are applicable to HEP projects E. Buckley-Geer, CHEP 2000
Use of these tools at CDF • On days that development builds we create a rawhide release. This satisfies developers who need the up-to-date code but also need the whole release to actually build E. Buckley-Geer, CHEP 2000