200 likes | 323 Views
CVS II: Parallelizing Software Development. Author: Brian Berliner John Tully. Motivation. Typical scenario in software development: John is in charge of kernel’s execution threads; Geoff is in charge of kernel’s debugger thread; Brice is in charge of kernel’s file I/O thread.
E N D
CVS II: Parallelizing Software Development Author: Brian Berliner John Tully
Motivation • Typical scenario in software development: • John is in charge of kernel’s execution threads; Geoff is in charge of kernel’s debugger thread; Brice is in charge of kernel’s file I/O thread. • All three have very different tasks, but each must work on different parts of the same files. • Deadlines must be met; each needs to make and test changes independently.
What can we do? • No revision control – a disaster • Changes just overwritten; very disorganized (no separate versions/releases) • Lock files for exclusive use until all changes are made • All three programmers need to continually update and test their software, or deadlines can’t be met
Problem Addressed • Obvious limitations with sharing files (with no revision control) and locking files for one user at a time • The specific need: a freely available tool to manage software revision and release control in an environment that is: • Multi-developer • Multi-directory • Multi-group
“Current” Solutions • SCCS (Source Code Control System) – Early 1980s; old even for this paper (Bell 1981) • OK for small projects; only one developer may have writable file at a time • RCS (Revision Control System) – Improvement upon SCCS, but nothing really significant (Tichy 1982) • Better user interface/organization; same limitations
“Current” Solutions (cont.) • Many “environment-specific” tools – examples: • Use file-sharing through links when common in specific applications • Automatically produce objects for multiple architectures • In all cases, assumptions were made about the sources being controlled (this adds complexity) • Discussion : any experience with environment-specific tools?
What is CVS? • Concurrent Versions System: basically, RCS with conflict resolution • CVS: Front-end to RCS • Conflict Resolution Algorithms • Module database • Flexible Logging Options • Release tracking by tags and/or dates RCS: storage of source code, revision numbers Source Code
Software Conflict Resolution • CVS commands • “checkout” – get the current “head” revision • “update” – perform an RCS merge with current file and head revision • “commit” – make permanent change, only if checked out file is current with the head revision • “Copy-Modify-Merge” Paradigm • No locks – development proceeds in parallel • Conflicts not very common, so this paradigm is useful • Discussion : Does this paradigm work for real-world situations?
Storage Methods • Uses RCS Principle: most of the text will remain unchanged from 1 revision to the next • RCS (and CVS) files composed of deltas, or listings of: • Lines that have appeared • Lines that have disappeared • Changed Lines • In repository, store full text of newest revision, and deltas which work backwards to old revision • For branches, forward deltas are stored • Discussion: Real-world situations where branches are applicable?
Merging Methods • Merging changes into working copy similar to program that generates deltas for revisions • More RCS Principles: How to merge separate revision branches back together? • Compare two revisions to closest ancestor, determine which lines are: • Identical in all three revisions • Identical in two out of three revisions • Different in all three revisions
Module Database • Efficient ndbm database • Modules used to conveniently check out smaller pieces of large distributions • Very convenient aspect: actual physical location within the distribution is hidden • Prisma: entire UNIX distribution broken down into modules • example% cvs checkout diff example% cd diff; make
Configurable Logging, Tags • When files committed, log message can be created by arbitrary program • Text Editor • News Database • Mail Program • Tags or dates can be used to get exact copies of releases
Our Example : using CVS • John, Geoff, Brice: checkout the kernel module • John: modifies kernel.c and checks in changes Repository John Kernel 1.0 Geoff Brice Repository John Kernel 1.1
Our Example : using CVS • Geoff: checks in kernel.c a day later, but version no longer current with head revision • Assuming RCS merge will not have conflicts, what does Geoff need to do? Repository Kernel 1.1 Geoff Kernel 1.1 Merge
Our Example : using CVS • Brice: checks in kernel.c 2 days later, but version no longer current with head revision • Assuming RCS merge does have conflicts for Brice’s changes, what does Brice need to do? Repository Brice Kernel 1.1 Kernel 1.2 Conflict
Performance • Tests at Prisma – relevance? • 17,000 Files (~4 million lines of code) • 14 Software Developers • ~100 files changed/added per month • Performance results quite reasonable given above parameters (especially given proc. speed) • Check out a bit slow (~16 minutes for 1000 files) • Update (much more common case) about 10x faster
Performance • Real “stress” test: SunOS 4.0.3 Merge • 94 conflicts in test out of 233 • Sounds bad… but only took 2 days to fix all conflicts • True justification of validity of “Copy-Modify-Merge” paradigm
Conclusions / Limitations • CVS very simple, efficient, source code-independent; extremely popular • A few minor enhancements: • Extra database to “remember” who has a checked out copy of a module • Easier recovery if administrative files are removed
Future Research Directions • Interesting research problem (not handled at all by CVS): modifying file names / directory structures quickly • (See subversion principles…..) • Support for producing objects for multiple architectures another research topic • Caution here – shouldn’t harm efficiency of CVS
Discussion • Anything brought up before... • Tools other than CVS (language/environment specific) • Copy-Modify-Merge Paradigm • Experience with branches? • Experience with huge amounts of code?