550 likes | 561 Views
Learn how to build and modify Condor, a high-throughput computing system. This talk covers space and UNIX requirements, GNU tools, downloading, and the building process.
E N D
Nick LeRoy Computer Sciences Department University of Wisconsin - Madison nleroy@cs.wisc.edu http://www.cs.wisc.edu/condor Building and Modifying Condor
Before I start … • If you have any questions, stop me along the way • There should hopefully be time for discussions after the talk • Feel free to talk to me, or any of the Condor developers, any time during the conference • Todd will give the last part of the talk • Windows specifics
Space Requirements • 5G is probably enough • Actual amount depends on the actual features built • Bare minimum 2G • Temporary space is required for building externals, automatically cleaned up
UNIX Requirements • Most tools are standard on Linux development systems • In other cases, they can be downloaded as binaries • Or, downloaded as source and built by hand
UNIX Requirements List • GNU tools: • GNU make • GNU autoconf and autoheader (2.59 or greater) • GNU tar (1.13 or higher) • GNU Compiler Collection (gcc >= 2.95.3) • gzip • Other tools: • perl (5.005_03 or greater) • patch (must support unified diffs, GNU patch is preferred) • strip (can be either GNU or the vendor's version) • lex • yacc (or GNU bison) • some other typically-found utilities (for example, cut, awk, etc.)
Getting it • Download from the same place that you download the rest of Condor • In the form of a gzip-ed “tarball” • Unpack the tarball • If you don’t know how to do this, try: rm condor_src-7.1.0-all-all.tar.gz
First Glance • BUILD-ID • NMI build ID, you can ignore this • config and imake • Yes, we still use imake • The rest of the world wisely abandoned it years ago … • You can probably ignore these • Adds requirement: GNU cpp <= 4.1.3 • LICENSE-2.0.txt • Copy of the Apache License, Version 2.0 • The license under which we’ve released Condor
Interesting Pieces • README.building • Document describing building Condor • NTconfig • Files required for building under Windows • externals • Externally maintained packages • Some are “hard” requirements, others “soft” • src • The Condor source code
Simple Build • The basic Condor build is simple: $ cd src $ ./build_init $ ./configure $ make
Didn’t work? • Most common problem is that you’re trying to build on a system that we haven’t ported the Standard Universe to • Solution: Disable the standard universe and try again $ ./configure --disable-full-port \ --disable-gcc-version-check $ make
Externals • Always have your bags packed • Bags are getting pretty big these days • Globus, ClassAds, PCRE, zlib, Kerberos • Externals and versions by configure • To use system packages: $ ./configure --enable-proper • “All or nothing” • Some features (in particular Condor-G) will be disabled • We’re working on making this selective • Externals tree selected by: $ ./configure --with-externals=/path/tree
First look at src • CODING_GUIDELINES • condor_* • Directories with most of the source code • In the future, we’ll rename them and get rid of the condor_ prefix • Also: h • We’ll look at more of these later
Configuring the build • Uses GNU configure • Some options, like, --prefix don’t work • Make sure that the cpp you use isn’t >= 4.2 $ export CXXCPP=/usr/bin/cpp-4.1 $ ./configure • Default: $ ./configure
Minimal configuration • To save disk & time, make use of –without-xxx or –disable-xxx options you don’t care about • Use ./configure –help to get a list of them • Packages listed as “hard requirement” can’t be turned off • There are some interdependencies $ ./configure --without-globus --without-nordugridgahp --without-unicoregahp --without-gt4gahp --without-srb --without-oci --without-gcb --without-gsoap --without-drmaa --without-gahp --without-blahp --disable-full-port
Some Problems & SolutionsUnknown GCC version configure: error: Condor will not compile with gcc version 4.2.1 • Try: $ ./configure --disable-gcc-version-check • The build itself may fail due to compiler incompatibilities
Some Problems & Solutions Unknown glibc version checking glibc... ERROR configure: error: Condor does NOT know what glibc external to use with glibc-2.6.1 • Edit (yeah, with vi or emacs) configure.ac • Around line 2500, add a block for your glibc version (cut & paste from nearby): "2.6.1" ) # OpenSUSE 10.3 uses glibc 2.6.1 including_glibc_ext=NO ;; • Rerun ./build_init for this to take affect
Build it • From the src directory: $ make • Will build the externals as required • Go get a beverage – this could take quite a while
Build Problems & Solutions Error in ClassAds external classads-1.0rc5: FAILED! (see /home/condor-7.1.0/externals/build/log.classads-1.0rc5) • Disable ClassAds in configure: $ ./configure –without-classads • condor_q –better-analyze will be broken
Build Problems & Solutions Error building other externals xxxx-1.2.3: FAILED! (see /home/condor-7.1.0/externals/build/log.xxxx-1.2.3) • Disable xxxx in configure: $ ./configure –without-xxxx • If this is a “hard requirement” or you rely on this feature: • Look in the above log and correct the problem
Build Problems & SolutionsStandard Universe /tmp/IIf.0twp5X:114:6: error: #error Checkpoint library not compatible with compiler! ../../imake/imake: Exit code 1. Stop. • Standard Universe features haven’t been ported to this compiler / platform yet. $ ./configure --disable-full-port
It built! make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/home/build/condor-7.1.0/src/condor_examples‘ $make release …
Build targets • Testing release • $ make release • Suitable for testing • Creates release_dir • Public release • What we actually release to the public • $ make public • Packaged tarballs wind up in ../public
Test It • We’ll create a test installation of our Condor build • We built condor in /home/condor-7.1.0 • We’ll make our test directory a subdirectory of that • /home/condor-7.1.0/install • Do a basic Condor install of the Condor from release_dir, just like you would any other Condor install • Or …
Test Installation(Step by step) $ CONDOR=/home/condor-7.1.0/install $ mkdir $CONDOR $ cd $CONDOR $ mkdir checkpoints cred_dir execute spool log test $ ln –s ../release_dir/* . $ cp etc/examples/condor_config.generic etc/condor_config $ export CONDOR_CONFIG=$CONDOR/etc/condor_config $ vi $CONDOR_CONFIG $ export PATH=$CONDOR/bin:$CONDOR/sbin:$PATH $ rehash $ condor_master
Simple checks • Run ‘ps’, verify that the Condor processes are running • Run condor_status –any • Run condor_status to verify that the Startd’s machine is correct • Make sure that you wait a bit for the Startd to publish it’s ad(s) • Look through the logs • Submit a simple “hello world” test job, verify that it runs as expected
More tests • We have a whole suite of tests $ cd condor_tests $ make $ ./batch_test.pl –b IsThisNightly passed <…/src/condor_tests> Workspace testing … submitting . tests lib_chirpio_van.run succeeded lib_procapi_pidtracking-snapshot.run succeeded … • Wait patiently (very patiently)
Use the source, Luke • Libraries • Daemon Core • Client (command line) Tools • Daemons • Standard Universe • Other
Source Directories • Most of the directory names are pretty clear • We’re in the process of cleaning up, moving things around, and renaming, so be prepared for changes over time • GIT is finally giving us this freedom • Quite a few have version numbers in the name that make little or no sense to the outside world (condor_startd.V6, …) • This will get cleaned up, too
Master, Quill, Startd, Shadow, Starter, Collector Submit, Q, tools, etc. ClassAds, I/O, Daemon Client, Daemon Core, ProcAPI, SysAPI C++ Utilities, C Utilities “h”, includes Layering
Condor Libraries • The layering is not perfect, there are interdependencies • General purpose: • condor_util_lib • condor_c++_util • I/O & Networking: • condor_io • condor_daemon_client • Process Tracking: • condor_procapi • System Information: • condor_sysapi • ClassAds: • condor_classad • Daemon Core • condor_daemon_core.V6
C / C++ Utilities • In general, there’s a utility for everything • POSIX and stdio library wrappers • C++ Standard library replacements • Condor templates (CTL) • We don’t use STL for hysterical reasons • Designed to be portable • Look here before reinventing the wheel
C: dprintf() • Works like printf() • Conditionally writes to the log dprintf(D_ALWAYS, “Two + two is %d\n”, 2+2); • OR together for multiple levels, so dprintf(D_COMMAND|D_SECURITY, <…>); • Useful debug levels • D_ALWAYS • D_FULLDEBUG • Everything else is probably too esoteric (see condor_debug.h)
C++: MyString.h • Similar to STL’s string • Prefer MyString buffer to char buffer[1024] • automatically allocates and resizes memory • Notable methods / operators: • sprintf() and sprintf_cat() • Value() and GetCStr() – read-only access • += is overloaded to append a lot of types to the string • perl-like chomp() and trim() to get rid of whitespace • readLine() that can slurp in data from a FILE* and ostreams • replacement for strtok() • Other tricks • search for substrings • escape characters
C++: Configuration • Lookup values from the configuration • NOT a ClassAd! • Basic: param(const char *name) • Returns a char * that you must decode manually • You MUSTfree() this buffer! • Others: param_<type>(<name>) • Decodes to the specified type, and free()’s the buffer • Does NOT handle expressions! • Integer: param_integer(<name>) • Double: param_double(<name>) • Boolean: param_boolean(<name>)
C++: Boolean Configuration Expressions • Boolean Expression: param_boolean_expr(<name>) • This one Does handle expressions • Configuration: WIZBANG = ( FUBAR > 10 || SUPERCALIFRAGILISIC ) • Source Code: bool wizbang = param_boolean( “WIZBANG” );
More C & C++ • Wrappers and similar: • safe_open_wrapper(), my_popen() • “CTL” • ExtArray, string_list, Queue, tree, stringSpace, counted_ptr • A lot of other classes & functions • File / Directory access classes: Directory, StatInfo • exponential_backoff • my_hostname(), my_username()
Condor I/O & Networking • All Condor daemons have a “Command Socket” • Data is encoded with CEDAR • Condor External DAta Representation • CEDAR is all-singing, all-dancing • Data representation • socket abstraction • Security • bandwidth limiting • port ranges
Stream, Sock, et. al. • The layering of the Condor socket objects is not obvious • Stream (base class, in stream.{h,C} ) • CEDAR streaming • Integers, chars, strings, etc. • Sock (derived from Stream, in sock.{h,C} ) • Adds connection / session management • ReliSock (derived from Sock, in reli_sock.{h,C} ) • TCP-specific “Sock” • SafeSock (derived from Sock, in safe_sock.{h,C} ) • UDP-specific “Sock”
Daemon Client • Series of classes with knowledge of how to communicate with specific daemons • Master, Collector, Startd, etc. • All derived from a common base
ClassAds • C++ API to access the ClassAds that Condor uses internally • “Old” ClassAds • Subclassed from AttrList, so look there • Lookup() versus Eval() • Lookup() will return “7 + 2” • Eval() will return 9 • ClassAds are parsed to ExprTree(s) • Can generally avoid this and use Eval<Type> • Insert() and Assign() to update the ad • sPrint(), fPrint(), and dPrint() to serialize
Condor Daemons • The code for most Condor daemons are in directories named after the daemon: • Startd is in condor_startd.V6 … • Note: 2 sets of starters / shadows • condor_starter.V5and condor_shadow.V6 • Standard Universe • condor_{starter,shadow}.V6.1 • All others
Daemon Core • Heart and body of a Condor daemon • Usually a singleton object • Event-driven loop around select() • Single threaded! • Your code registers events for select() and callbacks • Timers, Pipes, Signals, Reaper, Socket, CEDAR “Commands”
Registering a Callback • Use Daemon Core’s Register_Command() method: daemonCore->Register_Command(128, "SAY_HELLO", (CommandHandler)&say_hello, "say_hello", NULL, READ, D_FULLDEBUG ); • Parameters: • The command number (usually defined in condor_commands.h and condor_commands.C) • Text description of the command • "CommandHandler", which is really a function pointer • Text description of the handler • The service class to use -- since this is a C handler, we don't need one. • What Permission level we need to be to call this function (i.e. HOSTALLOW_READ, HOSTALLOW_ADMINISTRATOR, etc) • What dprintf() level to use
Some guidelines • You must not • Throw an exception • Call printf() or exit() or assert() • You can: • call ASSERT() • call dprintf()
Dependency Hell • Dependancies work on Windows • Our build system has no knowledge of dependencies • If you modify an include file, make sure that everything that depends on it gets rebuilt • $ make clean && make
More on Dependencies • Objects from some directories need to get “repackaged” with the C++ library • condor_classads • condor_daemon_client • Thus, to rebuild these: • $ make && make –C ../condor_c++_util
(Even) More on Dependencies • If you’re working on a daemon and make a library change • Example daemon: Startd in the condor_startd.V6 directory • Example library: condor_daemon_client $ make –C ../condor_daemon_client && make -C ../condor_c++_util && make release • If you modified dc_startd.h and want to be paranoid: $ (cd ../condor_daemon_client && make clean && make) $ (cd ../condor_c++_util && make clean && make) $ make clean && make release
Adding a Source File • Add the file to the appropriate section of the Imakefile • No, I’m not going to explain our Imakefile syntax here $ ../condor_imake $ make
Testing & Debugging • OK, You’ve built a modified Startd, how do I test / debug it? • Remove STARTD from DAEMON_LIST • Start the master • Run the startd by hand $ ./condor_startd -t –f • -t to log to stdout • -f to run it in the foreground • CTRL-C to kill it
More debugging • Segfaults can sometimes be caused by object version mismatches • You added a field to a class in C++ Util, but didn’t rebuild the Startd that uses the class • With the the use of the –t and -f flags, you can debug like any other program • Adding dprintf()’s • With gdb • Using strace