230 likes | 641 Views
ClearCase MultiSite. ClearCase MultiSite: Supporting Geographically-Distributed Software Development Anita Gupta Review of Larry Allen, G. Fernandez, K. Kane, David Leblang, D. Minard, and J. Posners’ paper September 29, 2004. Introduction: Parallel Development and ClearCase.
E N D
ClearCase MultiSite ClearCase MultiSite: Supporting Geographically-Distributed Software Development Anita Gupta Review of Larry Allen, G. Fernandez, K. Kane, David Leblang, D. Minard, and J. Posners’ paper September 29, 2004
Introduction: Parallel Development and ClearCase • The size and complexity of software projects has increased greatly over the years. • Large projects need to have several independent “lines of development” active at the same time. Parallel Development • The process of creating and maintaining multiple variants of software system. • It is the key requirement for any version control and configuration management system targeted at large development environments. code Single Software System Engineer many million lines of code several hundred software engineers
Introduction: Parallel Development and ClearCase • Atria Software’s ClearCase • Be used for parallel development in a local area network • Provide geographically-distributed parallel development • ClearCase Basics • What does ClearCase do? • Manages multiple variants of evolving software systems • Tracks which version were used in software builds • Performs builds of individual programs or entire release according to user-defined version specifications • Enforces site-specific development policies and processes • Versioned Object Base(VOB) • It contains data shared by all developers: - current and historical versions - derived objects (by compilers, linkers and so on) - “accounting” data on the development process itself( who, when, why …) - user defined meta-data
Introduction: Parallel Development and ClearCase The whole database consist of several VOBs (private VOBs, shared VOBs) Virtual file system technology Create a virtual workspace called a view which makes a VOB look like an ordinary file system source tree to users and their off-the-shelf tools. Off-the-shelf tools / usr etc home proj proj2 src bin doc parse.c parse.h lex.h rlsnote parse.o Operating System ClearCase Virtual File System versions of files parse.c parse.h lex.h rlsnote parse.o another collection of objects VOB
Introduction: Parallel Development and ClearCase • Parallel Development with Branches • ClearCase allows multiple developers to modify a single source file simultaneously without contention or loss of changes. \main -Versions all file types - Versions directories - Files are read-only until checked out - Unlimited branching and merging 0 VERSIONS 1 Beta_01 BRANCH 2 LABELS Rls1.0 \Rls2_bugfix 3 0 Rls2.0 1 4 2 5 Rls3.0 MERGE
Introduction: Parallel Development and ClearCase • ClearCase Meta-Data • Versionlabels are mnemonic names for particular versions. • Attributes are name/value pairs which are used to represent information state about a version for the integrating process-control mechanisms. • Hyperlinksenable users to define structured or ad hoc relationships between pairs of objects.
The Challenge of Geographically-Distributed Development • Geographically-Distributed Development • Developers are located at several physical sites • Each site develops one or more subcomponents of a large software system. • Sites have private sources and need to share sources, libraries and header files with other sites. • Difficulties of parallel development in a geographically-distributed environment • Timezone differences • Language barries • Complicate communication and coordination among team members • Meta-data must be managed as well as source file data
Alterative Designs • Global Access to a Centralized Repository • Simplest approach • Centralized shared repository • Problems • frequent accesses make the system vulnerable • frequent accesses affect ClearCase performance in low bandwidth wide-area network • remote accesses present problems with system scalability • Locally-Caching File Systems • Cache information locally at each development site and make use of a caching remote file system such as AFS (Andrew File System) • Problems • meta-data cannot be cached locally • the problems of robustness, performance, and scalability are still existing
Alterative Designs • Repository Replication • Replication of the entire repository,including both the database and the version data storage to each local site. • All the sites change their replicas independently. • Problems • the potential for conflicting changes and subsequent inconsistency of replicas • Serially-Consistent Replicas • An algorithm to keep all the replicas continuously synchronized and avoiding the possibility of lost of conflicting changes. • Constraints Either reading or writing of data at any replica requires that at least a majority of all replicas be accessible. • Problems • even worse scaling characteristics
Alterative Designs • Weakly-Consistent Replicas • Relax the requirement of serial consistency with no guarantee that a change made at one replica is immediately visible at the replicas. The replicas will be resynchronized periodically. • Manual intervention: suitable for systems in which individual files are infrequently modified at multiple sites and only small number of replicas. • Manual intervention: not suitable for the complex database of a software repository which is modified continuously at all active development sites. • An alternative approach is taken by Grapevine (a mailing-list registration database) • assigning modification time • allowing some changes be lost without notification.
The MultiSite Solution • The strategy for geographically distributed development • Different development projects proceed concurrently and independently. • Each project using a different branch of an element’s version tree. • MultiSite enforces the rule that different sites work on different branches. • assigning mastership to individual branches • only one site can extend a particular branch • automatic resynchronization Register.c(France) Register.c(Australia) Register.c(France) Register.c(Australia) modula2 pascal modula2 pascal pascal modula2 modula2 pascal 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 4 versions imported from Australia 4 versions imported from France 4 4 5 5 existing version that is read-only at this site existing version that is read-only at this site new version created at this site New(read-only) version created by synchronization update
The MultiSite Solution • Services provided by MultiSite to support above strategy • Maintenance of a local replica of a VOB at each site. • Enforcement of the rule that different sites work on different branches of an element. • Synchronization of the multiple VOB replicas, communicating the changes among the sites via network connections or magnetic tape. • ClearCase merge tools to merge independent changes to the same source file. • VOB Replicas • MultiSite can maintain any number of VOB replicas, distributed at different sites. VOB family original VOB Replicas diverge by local updating and converge by synchronization VOB replica 1 VOB replica n …… VOB replica n3 VOB replica 11 VOB replica 12 VOB replica n1 VOB replica n2
The MultiSite Solution • Branch Mastership • Assigning mastership to individual branches: new versions can be created on a branch only in the replica that masters that branch. • MultiSite assigns each object a mastering replica that is the only replica permitted to modify the object. • Mastership can be transferred between replicas if needed. • Synchronization • Changes circulating among the replicas. • Update topology: star, multi-hop chain, etc. • Transport mechanism: TCP/IP, file-transport, magnetic tape
Implementation of MultiSite • Multiple-part timestamp schemes • Operations log a record of each change stored as an entry • Synchronization packet all the operations log entries made in the replica since the last generated synchronization packet • Import changes importing the changes contained in the packet into the destination replica or replicas by replaying them in order Synchronization Update Checkout Brach Again Checkout Brach Checkin Vers. 1 Checkin Vers. 2 Replica A 1A 2A 3A 4A Apply Label to Vers.1 Move Label to Vers.2 Replica B S1 S2 1B 2B Replica C S2 S4
Implementation of MultiSite • Logging and Virtual Timestamps • Operations log contains: operations originating at the replica, operations imported from other replicas. • Virtual timestamps: a counter of the number of operations originating at the same replica --- “epoch number” reflecting the order of the operation with respect to others originating at the same replica • Table of virtual timestamps contains one row and one column for each replica in the VOB family • row for destination replica, column for origination replica • each row represents a multiple-part timestamp reflecting the last known state of the corresponding replica FROM A B C A B C Table of Virtual Timestamps at Replica B after the first update generated to replica C(S2) TO
Implementation of MultiSite • Generating updates • MultiSite scans the log of the sending replica looking for operations that are not known to have already been imported by the destination replica. scan for entries with timestamps higher than those reflected in the table row for the destination replica • Form the update packet update packet contains operation found, the identity of the replica at which the operation originated and the operation’s virtual timestamp • After generating an update packet, the sending replica increments the entries in the table row for the destination replica. in order to reflect the operations sent in the packet
Implementation of MultiSite • Importing updates • Before importing, check the possibility of inconsistency • Bycomparing the “starting state” virtual timestamp row contained in the packet to the importing replica’s own table row for itself • If any entry in the packet row is larger than the corresponding entry in the replica’s table row, then the importing replica is missing operations that the sending replica expected the receiver to have already imported. • All operations are imported in the order of their appearance in the packet • The importing replica’s own table row is updated • The operation with a lower virtual timestamps is a potential dependent of the operation with a higher virtual timestamps.
Implementation of MultiSite • Purging Operations Logs • To prevent operation logs from growing without bound • An age-based mechanism(by default 180 days) • Timestamps are used to prevent potentially inconsistency • Recovery from Backup • Roll-forward approach to require the restore replica to re-import any operations it originated( and that possessed by other replicas) before permitting users to perform new operations on the replica • Recovery algorithm to make all replicas reset their virtual timestamp table row for the restored replica to reflect the restored state of that replica • Recovery updates are required to be imported from all other replicas MultiSite can’t determine the identity of the last replica update prior to the failure
Implementation of MultiSite • Recovery Detail • Lock the restored replica immediately • Add a special entry to the operations log • Add the restored replica’s virtual timestamp table row • Other importing replicas reset their virtual timestamp table row for the restored replica to the value indicated by the entry • Re-recoveries • A real timestamp (clock time) is added when the restored replica is re-restored from backup tape before completing its recovery • Other importing replicas only reset their virtual timestamp tables in response to special log entry with a real timestamp later than any previously seen.