1 / 14

LCGAA nightlies infrastructure

LCGAA nightlies infrastructure. Alex Hodgkins. Our nightlies. Build and test various software projects each night Provide a nightlies summary page that displays all the results from the previous nights build (number of build/test errors) and historical data

nola
Download Presentation

LCGAA nightlies infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCGAA nightlies infrastructure Alex Hodgkins

  2. Our nightlies • Build and test various software projects each night • Provide a nightlies summary page that displays all the results from the previous nights build (number of build/test errors) and historical data • Place all of the files onto AFS and make the log files accessible from the nightlies page

  3. Infrastructure Tasks • We use CMT to manage the checkout, build, install and test of each software project, as well as any dependencies. • The infrastructure is responsible for: • Automating the usage of CMT to perform all build steps • Putting results into our database • Moving builds to AFS (for Mac and Windows) • Deleting old build data

  4. Problems with previous infrastructure • Queue • No way of viewing the queue • No way to add/remove a platform to be built to the running queue – the entire queue must be reset • Client • No automated way to kill a started build • If a platform is manually re-built on the same day it will overwrite the previous build completely • Builds occasionally hang, and stay hung silently until they are manually killed • Scheduling • Clients must be launched manually or the crontab edited to schedule a client to run later • The moving of builds to AFS is currently scheduled by crontab, instead of when a build finishes • Reporting • Once a build has been requested from the server it is marked as completed – the server doesn’t know who is building it or what stage it’s at • The infrastructure sends up to 20 e-mails per project per night

  5. New Infrastructure

  6. General client-server improvements • The client-server interaction has been re-designed to be much more flexible: • The server now distributes builds to any idle clients • Client updates the server with its progress • All client calls to the server are run in a separate thread so the build wont be halted • A job can be assigned to a specific client • Once a project has finished, the synchronisation job is called on the server to copy mac/windows builds to AFS

  7. Client server registration • The protocol we use for client server interaction (XMLRPC) is stateless, so to keep track of connected clients extra steps were required: • The server maintains a list of clients that have connected • Each client can be in one of three states; connected, unknown and disconnected • The server is required to make sure any ‘connected’ clients are still reachable, by frequently checking that they respond to requests

  8. Server design • The server is split into three separate threads: • The pinger – responsible for pinging all the connected clients to check they are still responding • The listener – responsible for managing all XMLRPC requests, but can only process one request at a time. For requests that will not return instantly (e.g. a request to copy a build to AFS) a new thread is created • The dispatcher – responsible for continually checking the database for any new build requests, and distributing to the client.

  9. Client design • The client also has an XMLRPC server that is primarily used to get the client’s capabilities and start a build on the client • On start up the client decides how many projects it can build at once (based on the number of cores), and allocates the necessary amount of builder slots • When a build request is received the client starts a new build (the SlotBuilder class) in a separate thread, and stores the thread instance • During the build the client tells the server which project and which build step it is currently on • If the client decides a step has failed (e.g. checkout returned non zero) it notifies the server, and both the specific project and the job are marked as failed

  10. Database changes • LCGSOFT and nightlies databases have been merged to allow releases to be managed through the same interface • All configuration is now stored in the database, removing the redundancy of the configuration files • The job queue is now kept in the database, allowing a decentralised queue that can be viewed from anywhere and does not rely on our server instance • Cancelled slot configurations are kept permanently and remain linked to specific jobs, so we can easily see what was built on any given date, as well as the machine they were built on • The server and request interface both now use django for database interaction which gives much more compact, flexible and readable code (47 lines vs. 275 for sending results)

  11. Code cleanup • The original nightlies scripts have been edited by many people since they were first written, leaving a lot of redundant code and many changes had been hacked in, making it difficult to integrate any major features: • Redundant code has been removed, and the majority of the remaining code re-written. Comments have also been added • Documentation has been created to accompany the nightlies scripts, explaining how they work • Clearer logging has been added throughout • Unit test suite and integration tests added

  12. Remaining tasks • Everything I have implemented so far has been fully tested and is ready for production • For now the new scripts place all build data into both the databases, so they can be slowly phased in without disrupting any end users • There are three remaining tasks that are essential to the new scripts being placed fully into production: • Incremental builds and non-nightly builds must be added first, or there would be no use in a request interface • The request interface to allow jobs to be added and job configurations to be edited • A new summary page (to be re-written using django) so that all the results can be seen quickly and easily

  13. Remaining jobs (contd.) • We also hope to eventually have the following: • Ability to kill a running build • An automated release process – releases should be done automatically by marking an existing build in the request interface to be released • Allow the building of externals through the request interface • Ability to shut a client down from the server

More Related