210 likes | 370 Views
The OxGrid Resource Broker. David Wallom. Overview. OxGrid Resource Broking Why build our own Job Submission and other tools Future developments. OxGrid, a University Campus Grid. Single entry point for users to shared and dedicated resources
E N D
The OxGrid Resource Broker David Wallom
Overview • OxGrid • Resource Broking • Why build our own • Job Submission and other tools • Future developments
OxGrid, a University Campus Grid • Single entry point for users to shared and dedicated resources • Seamless access to NGS and OSC for registered users
Resource Broking • The original idea of the grid relied on efficient resource broking to abstract the user away from the resources • This has been significantly neglected by grid software developers • Push or pull type of mechanism, each have significant advantages or disadvantages • Resources that have multiple job sources increase complexity many fold
Why build our own? • OxGrid is intended to be a lightweight development • Replacement of individual components should be simple • Use of service based interfaces are the goal • Current solutions do not allow this with massive dependencies and non trivial maintenance requirements • Condor-G is a simple off the shelf Grid system meta scheduler, why make it so much more complicated?
Condor Matchmaking • Matchmaking is a methodology for Distributed Resource Management • Conceptually simple: • Service providers and requesters advertise • Compatible advertisements are matched • Matched entities cooperate to perform service • Developed for opportunistic environments • Use resources as and when available Thanks to the Miron and the Condor Team
Condor Matchmaking (Cont.) • Customers and Servers advertise to a Matchmaking Service • Advertisements describe advertising entities • Characteristics • Requirements and Constraints • Preferences • These descriptions are called classified advertisements (classads) Thanks to the Miron and the Condor Team
Static and Dynamic Information • Static information • e.g. processor architecture, physical memory, operating system, scheduling system, no. of nodes • Dynamic information • e.g. system availability, scheduler load, queue length, used disk or memory
OxGrid Virtual Organisation Manager Database • Final repository for authorisation information • Stores additional static information for each resource such as capability and maximum number of submitted jobs for that node
Data Harvesting cycle • Information sources can be added or removed at will • Either a single repository for information aggregation (e.g. ngsinfo) or individual machines • Simple internal representation of information gives ease of adding new types of info source
Generated classad MyType = "Machine" TargetType = "Job" Name = ”bedrock.oucs.ox.ac.uk-condor“ gatekeeper_url=”bedrock.oucs.ox.ac.uk/jobmanager-condor" Requirements=(CurMatches<20)& (TARGET.JobUniverse == 9) WantAdRevaluate = True UpdateSequenceNumber = 1097580300 CurMatches = 0 OpSys = "LINUX“ Arch = "INTEL" Memory = 501 MPI = False INTEL_COMPILER=True GCC3=True
Tuning Condor to act as a metascheduler • The default configuration of Condor is as a cycle scavenger • Alter this through ensuring that all available tasks are attempted to be matched with each pass of the Negotiator • Since we are a Condor-G system only we change the default universe of the system to grid
Changes to Condor configuration DEFAULT_UNIVERSE = GLOBUS CLASSAD_LIFETIME = 900 NEGOTIATE_ALL_JOBS_IN_CLUSTER = True NEGOTIATOR_INTERVAL = 30 JOB_START_DELAY = 10 GRIDMANAGER_JOB_PROBE_INTERVAL=60
Job Submission • Most users are comfortable with command-line applications • Condor submission scripts would be another language for our users to learn… • submission step as a scriptable application with argument • Created job-submission
job-submission -h <HOSTNAME>/<JOBMANAGER> -e <EXECUTABLE> -t Boolean transfer exe? -a EXE arguments -i Input files to be transferred -o Output files to be transferred
Job classad executable = update_file Transfer_Executable = True globusscheduler = $$(gatekeeper_url) Requirements = (TARGET.gatekeeper_url == "t2ce02.physics.ox.ac.uk/jobmanager-lcgpbs" || TARGET.gatekeeper_url == "condor.oucs.ox.ac.uk/jobmanager-condor" || TARGET.gatekeeper_url == "grid-compute.oesc.ox.ac.uk/jobmanager-pbsox" || TARGET.gatekeeper_url == "bedrock.oucs.ox.ac.uk/jobmanager-sge") && TARGET.gatekeeper_url =!= UNDEFINED && TARGET.OpSys == "LINUX" match_list_length = 1 arguments = TEST_3_2.in TEST_3_2.out transfer_input_files = TEST_3_2.in transfer_output_files = TEST_3_2.out WhenToTransferOutput = ON_EXIT universe = grid grid_type = gt2 notification = ERROR output = temp-1168783341-2.out error = temp-1168783341-2.err log = temp-1168783341-2.log queue
Additional User Tools • oxgrid_certificate_import • Simplifies the installation of a user digital certificate to a single command • oxgrid_q • Display the users current queue at the resource broker. Has the options to allow the user to see the full task queue. • oxgrid_status • Displays the resources that are available to the user with options for all resource currently registering with the resource broker • oxgrid_cleanup • Removes either a single submitted process or a range of child processes with their master
Users • Statistics • Materials science • Inorganic chemistry • Theoretical chemistry • Biochemistry • Computational biology • Astrophysics • Condensed matter physics • Zoology • Researchers and students
Future Developments • As part of GridBS project development: • Additional direct submission into MS CCS using GridSAM BLAH • Addition of new types of data sources • EGEE • Grimoires • Continue to improve packaging to ensure ease of installation and re-distribution
Conclusion • We have designed a resource broker that is orders of magnitude small with minimal external dependencies • Simple tools have allowed users of OxGrid easy access to resources in many different institutions • Over 65k individual tasks have been submitted to connected resources since January