220 likes | 344 Views
The Condor Data Access Framework. GridFTP / NeST Day 31 July 2001 Douglas Thain. The Condor Data Access Framework. Philosophy Components Organization: Communities Resource Discovery with ClassAds Example Applications Ongoing Work. Philosophy.
E N D
The CondorData Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain
The CondorData Access Framework • Philosophy • Components • Organization: Communities • Resource Discovery with ClassAds • Example Applications • Ongoing Work
Philosophy • Goal: location-independent execution of jobs with large I/O needs. • Build moderately-sized mechanisms that can be quickly deployed to existing problems. • With experience, explore general-purpose polcies and larger systems. • Priorities: • Reliability and Correctness • Throughput (PB/year) • … • Performance (MB/sec)
Where does Globus fit in? • We expect that the Globus protocols will be the lingua franca of the grid. • Condor is committed to speaking the right language in order to participate. • Like any integration effort, there are some impedance-matching problems in both protocols and APIs. • None are insurmountable.
Components • NeST - Network Storage Appliance • ReqEx - Scheduled Data Mover • Kangaroo - Opportunistic Data Mover • Bypass - Adapts Apps to Grid • ClassAds - Express Relationships • Others?
NeST Performs I/O as apps request and conditions permit. Schedules I/O according to declarations. FTPD NeST MSS ReqEx Bypass Adapts ordinary I/O operations into grid protocols. ClassAds Express relationships and restrictions between participants.
ReqEx Scheduled Data Mover FTPD NeST Begin with list of jobs and data needs. Reserve space, Move inputs, Submit jobs, Move outputs.
Kangaroo Opportunistic Data Mover FTPD NeST NeST Move outputs back: During execution As conditions permit Fine-grained Hop-by-hop Move inputs: On demand Should cache
Bypass Creates interposition agents that re-route system calls to other code. Pluggable File System (PFS): An agent build with Bypass. Presents grid protocols as filesystems. vi /ftp/coral.cs.wisc.edu/etc/hosts NeST Bypass
Organizing Structure:I/O Communities • A community is simply a storage appliance shared by a number of CPUs. • Traditional community: distributed file system. • Ordinary users want to restructure communities according to application and load. • So, communities for grid computing should be easy to set up, reconfigure, and tear down. • NeST + Bypass makes this easy -- use the protocol appropriate for the situation.
I/O Communities GridFTP Long-haul I/O Chirp Short-haul I/O
What Discovery System? If X is not on my disk, where can I find it? Replica Discovery Where is my disk? Where can I place My output now? If I fetch X, where should I put it so that others can find it? Device Discovery
Everything Together Remote Storage NeST Execution Site Long-Haul Short-Haul CPU Discovery Job Agent Device Discovery Replica Discovery
Resource Discoverywith ClassAds • “Classic” ClassAds describe the properties and requirements of two parties looking for each other. • When expressing I/O communites, there are three parties to a match: jobs, machines, and storage. • By extending the language slightly, we allow jobs to refer to the properties of the attached storage: • Requirements = NearestStorage.HasCMSData
Classic ClassAds Job Ad Machine Ad match Job Machine
References in ClassAds Refers to NearestStorage. Knows where NearestStorage is. Job Ad Machine Ad Storage Ad match Job Machine NeST
Job Ad: Type = “Job” Cmd = “cmsim.exe” Owner = “thain” Requirements = (OpSys==LINUX) && (NearestStorage.HasCMS) Machine Ad: Type = “Machine” Name = “vulture” OpSys = “Linux” Requirements = (Owner==“thain”) NearestStorage = (Type==“Storage”) && (Name==“turkey”) ClassAd Example Storage Ad: Type = “Storage” Name = “turkey” HasCMS = True CMSPath = “/cms”
Notes on ClassAds • Every match is a hint • Participants must verify in claiming phase. • Storage: If dataset is missing, abort process and roll back. • Reference feature is new - Condor 6.3 • A variation on ‘gang-matching’ as described by Raman, et. al.
Example Applications • I/O Communities: • Applied to CMS simulation codes running at INFN and UW. Unmodified apps retrieve calibration data from nearest NeST. • Kangaroo • Applied to Gaussian codes running at NCSA. Users get progressive output when possible, but network failures don’t stop output. • Same idea applied to CMS reconstruction at INFN. (Older work called Grid Console.) • ReqEx • In testing mode on CMS reconstruction at UW.
Ongoing Work • Move jobs to data or vice versa? • We can easily build communities for a particular application. Can we build software that works reasonably well in any situation? • Select staging or remote I/0? • Depends on number of jobs, storage capacity, network capacity, etc… • Integration with replica management. • Is the App->NeST channel collection aware?
Upcoming Publications • Thain, Basney, Chang, Livny, “The Kangaroo Approach to Data Movement on the Grid”, HPDC 10. • Thain, Bent, Livny, Arpaci-Dusseau, Arpaci Dusseau, “Gathering at the Well: Creating Communities for Grid I/O” - Supercomputing 2001.