260 likes | 430 Views
Quill Tutorial Condor Week 2006. What is Quill?. A non-invasive method of storing a read only version of the job queue and job historical data in a relational database. Why Do We Need It?. Presents the job queue information as a set of tables in a relational database (Big Win!)
E N D
What is Quill? A non-invasive method of storing a read only version of the job queue and job historical data in a relational database.
Why Do We Need It? • Presents the job queue information as a set of tables in a relational database (Big Win!) • Fault tolerance • Provides performance enhancements in very large and busy pools
schedd schedd Database quilld Job Queue Job Queue Job Queue Management Without Quill With Quill
Deployment • One Quill daemon per schedd • Quill daemons must be uniquely named • Each Quill daemon uses a unique DB name • Multiple Quill daemons may utilize one database server • Currently uses PostgreSQL • Recommend PostgreSQL 8.1 or later for automatic vacuuming of tables
Condor’s Interface to Quill • Modified two tools to utilize the DB • condor_q • condor_history • Very minor modifications to schedd • Multiple sources for Job Queue & History pose an interesting problem
schedd Database quilld Job Queue Job Queue Discovery Sequence(Local Query) 2 1 3 condor_q
schedd Database quilld collector Job Queue Job Queue Discovery Sequence(Remote Query) 2 1 0 3 condor_q
A User Perspective: condor_q • condor_q changes • -name takes a ScheddName or QuillName • -avgqueuetime details average time in queue for all jobs
A User Perspective: condor_qExample: condor_q -name Linux merlin > condor_q -name psilord_quilld@merlin.cs -- DB: psilord_quilld@merlin.cs : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 92.0 psilord 4/21 09:21 0+00:00:00 I 0 9.8 foo 1 jobs; 1 idle, 0 running, 0 held
A User PerspectiveExample: condor_q -avgqueuetime Linux merlin > condor_q -avgqueuetime -- DB: psilord_quilld@merlin.cs : <merlin.cs.wisc.edu:42999> : psilord_db Average time in queue for uncompleted jobs (in hh:mm:ss) 00:40:47.011993
Database quilld Job Queue History File Job History Discovery Sequence(Local Query) The quilld is never queried directly! 1 2 condor_history
Database quilld collector Job Queue History File Job History Discovery(Remote Query) NEW! The quilld is never queried directly! 1 0 condor_history
A User Perspective: condor_history • condor_history changes • -name takes a Quill Name to retrieve job histories from a remote quill’s database • -completedsince returns all jobs completed since a PostgreSQL formatted date
A User Perspective: condor_historyExample: condor_history -name Linux merlin > condor_history -name psilord_quilld@merlin.cs -- DB: psilord_quilld@merlin.cs : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 91.0 psilord 4/20 14:23 0+00:00:00 X ??? /scratch/psilor 92.0 psilord 4/21 09:21 0+00:00:00 X ??? /scratch/psilor 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor
A User Perspective: condor_historyExample: condor_history -completedsince Linux merlin > condor_history -completedsince "2006-01-01 00:00:01" -- DB: psilord_quilld@merlin.cs : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor
Short Circuiting the Discovery Sequence • Use the –direct option! • Examples • condor_q –direct rdbms • condor_q –direct quilld • condor_q –direct schedd • “rdbms”, “quilld”, and “schedd” are the actual parameters. • Invaluable for debugging!
PostgreSQL 8.1 Installation • ./configure • gmake && gmake install • mkdir /path/to/pgsql/data • initdb –D /path/to/pgsql/data • postmaster –D /path/to/pgsql/data • Note: Default port binding is 5432.
PostgreSQL Configuration • Add two special user accounts: quillreader and quillwriter • createuser quillreader --no-createdb --no-adduser --pwprompt • createuser quillwriter --createdb --no-adduser --pwprompt
PostgreSQL Configuration (cont) • Allow TCP/IP connections • Edit file postgresql.conf • Add listen_address = '*' • Allow connections from specific hosts • Edit file pg_hba.conf • host all quillreader 128.105.0.0 255.255.0.0 password • host all quillwriter 128.105.0.0 255.255.0.0 password • Note: only use ‘password’ authentication at this time.
Quill Configuration • User quillwriter needs a write password. • Store it in a file called .quillwritepassword in the $(SPOOL) directory. • Ensure only the condor uid can read it if Condor is running as root
Quill Configuration (cont) • Condor system specific attributes in file condor_config.local • QUILL = $(SBIN)/condor_quill • QUILL_LOG = $(LOG)/QuillLog • QUILL_ADDRESS_FILE = $(LOG)/.quill_address • DAEMON_LIST = …, QUILL • VALID_SPOOL_FILES = …, .quillwritepassword • DC_DAEMON_LIST = …, QUILL
Quill Configuration (cont) • Quill specific attributes • QUILL_ENABLED = TRUE • # The quill name must be unique across all • # quill daemons AND schedds • QUILL_NAME = psilord_quilld@merlin.cs • QUILL_DB_NAME = psilord_db • QUILL_DB_IP_ADDR = merlin.cs.wisc.edu:42999 • QUILL_POLLING_PERIOD = 10(seconds)
Quill Configuration (cont) • QUILL_HISTORY_CLEANING_INTERVAL = 24 (hours) • QUILL_HISTORY_DURATION = 30 (days) • QUILL_MANAGE_VACUUM = FALSE • QUILL_IS_REMOTELY_QUERYABLE = TRUE • QUILL_DB_QUERY_PASSWD = xxx
DB Storage Method • Schema designed to store and query classads • 4 tables to represent the job queue classads • 2 for history data • 1 for metadata • Some queries are easier than others • Ask more questions at the BOF!
Thank you! • Want more information? • BOF “Databases in Condor: Now and in the Future”