260 likes | 429 Views
The Condor DB Group Report. Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton. Overview. General overview of group projects (Naughton). Quill (Paulson). Condor DB Group. Overall task:
E N D
The Condor DB Group Report Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton
Overview • General overview of group projects (Naughton). • Quill (Paulson).
Condor DB Group • Overall task: • Focus on data management aspects of Condor • Deliver prototypes of useful technology • Explore, develop and evaluate technology that may be useful to Condor down the road.
Projects other than Quill • Provenance in a Condor System. • Statistical mining of log data to evaluate system health. • Interaction of user data placement, caching, and workflow job scheduling. • Job-machine matching in DB context. • Condor functionality based on App-Server technology. • Recency and consistency in captured data.
Provenance and Condor • Christine Reilly (chrisr@cs.wisc.edu). • Provenance: information on how data was produced. • Observation: for each user job, Condor can record: • Which version of program(s) was used; • Which version of data was used; • When it was produced; • What system it ran on (hardware, software.) • Questions: • How much information should we gather? • How much burden should we place on the system designer, application programmer, or both?
Debugging through log mining • Srinivas Lakshmikant (pachu@cs.wisc.edu) • Idea: • Record “events,” logically associated with entities. • E.g., job entities start, get scheduled, run, terminate. • Find which entities have infrequent events. • Find which entities lack frequent events. • Can you use this to detect problems? • Early results suggest yes: finds and pinpoints problems that might not be found otherwise. • How can you increase the accuracy and efficiency over naïve approaches?
Caching,Scheduling,Workflow • Srinath Shankar (srinath@cs.wisc.edu) • Idea: • Cache input files and intermediate files on disks of pool machines; • Record where these files are cached; • Schedule tasks in a workflow to minimize data fetches/moves • Result: potentially much greater throughput.
Job Matching in a DBMS • Ameet Kini (akini@cs.wisc.edu) • Idea: matching looks a lot like a DBMS join. • If machine and job data are already stored in a DBMS, can we or should we use the DBMS to do the matching? • Answer: early results are promising but this is a non-trivial problem.
Recency of Quill Data • Jiansheng Huang (jhuang@cs.wisc.edu.) • Problem: daemons report in at uncontrollable and unpredictable times. • Result: out of date and inconsistent data set. • Can we provide the user with a concise characterization of the recency of the sources relevant to a user query? • Note: surprisingly non-trivial to define what we mean by “relevant” in this setting.
App. Servers and Condor • Eric Robinson (erobinso@cs.wisc.edu) • Idea: applications servers provide a lot of technology that appears useful in a Condor setting. • Approach: build prototype of some Condor functionality using these tools, evaluate the approach.
Moving on… • Further questions on these projects? Best bet is to contact student listed on each slide. • On to Quill portion of talk.
The Condor Quill The Quill Developers “Give me a condor's quill! Give me Vesuvius' crater for an ink stand. Friends, hold my arms! For in the mere act of penning my thoughts of this Leviathan, they weary me. . . To produce a mighty book you must choose a mighty theme.” -Melville, Moby Dick
What is Quill? A non-invasive method of storing a read-only version of the Condor operational data in a relational database.
Quill: In pictures SchedD SchedD DBMS QuillD Job queue transaction log (job_queue.log) Job queue transaction log (job_queue.log) Disk With Quill Without Quill
Quill: Where we’ve been • First shipped in 6.7.11 (Sept 05) • Now “over the fence” – Condor Team is driving the 6.8 version • Response from users very helpful! • Lessons learned • Passive collection good • DBMSes are full of surprises
Quill: Where we’d like to be • Shared databases • Better job data • Data from non-job sources • More than just PostgreSQL DBMS • Examples of usage
Quill in Condor 6.9.3 • Development effort mostly complete • Previous bullet points addressed • Migration path for historical job data • Out of the box changes for Quill users: • Horizontal and vertical schema for active jobs • Jobs from multiple schedds in one database • By default, no new historical data stored
Example tables Horizontal Job Table Vertical Job Table
More job information • The lifecycle of the job would be nice to have • Events like those in the “user log” • But, need more info than what’s in the job queue • Passive data collection works
Quill 6.9.3 diagram DBMS SchedD QuillD Job queue.log event log (new) Disk • Schedd writes events to the new “Event” log, Quill daemon passively picks up the events and inserts them into the database. • For the schedd, event log contains userlog events and job history events
Examples • “Show me all the jobs that exited with a segfault that at some point ran on this machine” • “When my jobs get preempted, how long until they get matched again?” • “What is the average runtime for jobs for each different type of input file” • SQL “GROUP by”
Collecting non-job information DBMS SchedD StartD QuillD Negotiator event log (new) Disk
New information stored • StartD: Machine status • Negotiator: Matches made • Starter/Shadow: Files transferred • Collector: “Submitter” ads • All daemons: Generic Events, daemon ads
The DBMSD • New daemon responsible for database housekeeping • Only one needed per DBMS • Purges old data • Three classes, independent thresholds • Resource: Machine classads • Run: matches, job log events • Job: condor_history information • Estimates size of database • “Soft quota”, warn when exceeded
Multiple DBMS systems • Oracle supported • Appears to need less maintenance • A nearly unified schema • Main difference is large text fields • Same binaries, DBMS type selectable via configuration file
Example Usage • PHP web front end • Good enough for some people • Or, use as the basis for your own system • BoF on Thursday at 11:00am • We’ll use the web front end to explain the information Quill now stores