470 likes | 571 Views
Scaling Up Data Intensive Science to Campus Grids. Douglas Thain Clemson University 25 Septmber 2009. http://www.cse.nd.edu/~ccl. The Cooperative Computing Lab. We collaborate with people who have large scale computing problems.
E N D
Scaling UpData Intensive Scienceto Campus Grids Douglas Thain Clemson University 25 Septmber 2009
The Cooperative Computing Lab • We collaborate with people who have large scale computing problems. • We build new software and systems to help them achieve meaningful goals. • We run a production computing system used by people at ND and elsewhere. • We conduct computer science research, informed by real world experience, with an impact upon problems that matter.
What is a Campus Grid? • A campus grid is an aggregation of all available computing power found in an institution: • Idle cycles from desktop machines. • Unused cycles from dedicated clusters. • Examples of campus grids: • 700 CPUs at the University of Notre Dame • 9000-11,000 CPUs at Clemson University • 20,000 CPUs at Purdue University
Provides robust batch queueing on a complex distributed system. • Resource owners control consumption: • “Only run jobs on this machine at night.” • “Prefer biology jobs over physics jobs.” • End users express needs: • “Only run this job where RAM>2GB” • “Prefer to run on machines http://www.cs.wisc.edu/condor
Clusters, clouds, and gridsgive us access to unlimited CPUs. How do we write programs that canrun effectively in large systems?
F F 0.97 0.05 Example: Biometrics Research • Goal: Design robust face comparison function.
Similarity Matrix Construction Challenge Workload: 60,000 iris images 1MB each .02s per F 833 CPU-days 600 TB of I/O
I have 60,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers. I own a few machines I can buy time from Amazon or TeraGrid. I have a laptop. Now What?
Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS. F F F F F CPU CPU CPU CPU CPU F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN HN Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. Try 4: User gives up and attempts to solve an easier or smaller problem. F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN Non-Expert User Using 500 CPUs
Observation • In a given field of study, many people repeat the same pattern of work many times, making slight changes to the data and algorithms. • If the system knows the overall pattern in advance, then it can do a better job of executing it reliably and efficiently. • If the user knows in advance what patterns are allowed, then they have a better idea of how to construct their workloads.
Abstractionsfor Distributed Computing • Abstraction: a declarative specification of the computation and data of a workload. • A restricted pattern, not meant to be a general purpose programming language. • Usesdata structures instead of files. • Provide users with a bright path. • Regular structure makes it tractable to model and predict performance.
Working with Abstractions A1 A1 A2 A2 F An Bn AllPairs( A, B, F ) Custom Workflow Engine Cloud or Grid Compact Data Structure
All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 allpairs A B F.exe An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F
How Does the Abstraction Help? • The custom workflow engine: • Chooses right data transfer strategy. • Chooses the right number of resources. • Chooses blocking of functions into jobs. • Recovers from a larger number of failures. • Predicts overall runtime accurately. • All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not trivial) to solve for a specific abstraction.
All-Pairs in Production • Our All-Pairs implementation has provided over 57 CPU-years of computation to the ND biometrics research group over the last year. • Largest run so far: 58,396 irises from the Face Recognition Grand Challenge. The largest experiment ever run on publically available data. • Competing biometric research relies on samples of 100-1000 images, which can miss important population effects. • Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for a graduate thesis. (We can go faster yet.)
All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 allpairs A B F.exe An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F
M[0,4] M[2,4] M[3,4] M[4,4] F x d y M[0,3] M[3,2] M[4,3] x F F x d y d y M[0,2] M[4,2] x F x F F x d y d y d y M[0,1] F F F F x x x x d y d y d y d y M[0,0] M[1,0] M[2,0] M[3,0] M[4,0] Wavefront( matrix M, function F(x,y,d) ) returns matrix M such that M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] ) Wavefront(M,F) M F
Some-Pairs Abstraction SomePairs( set A, list (i,j), function F(x,y) ) returns list of F( A[i], A[j] ) A1 A2 A3 A1 A1 An SomePairs(A,L,F) A1 F (1,2) (2,1) (2,3) (3,3) A2 F F F A3 F
Makeflow part1 part2 part3: input.data split.py ./split.py input.data out1: part1 mysim.exe ./mysim.exe part1 >out1 out2: part2 mysim.exe ./mysim.exe part2 >out2 out3: part3 mysim.exe ./mysim.exe part3 >out3 result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result
Makeflow Implementation 100s of workers dispatched to the cloud worker worker worker bfile: afile prog prog afile >bfile worker worker worker queue tasks detail of a single worker: put prog put afile exec prog afile > bfile get bfile makeflow master work queue worker tasks done Two optimizations: Cache inputs and output. Dispatch tasks to nodes with data. prog afile bfile
Experience with Makeflow • Reusing a good old idea in a new way. • Easy to test and debug on a desktop machine or a multicore server. • The workload says nothing about the distributed system. (This is good.) • Graduate students in bioinformatics running codes at production speeds on hundreds of nodes in less than a week. • Student from Clemson got complex biometrics workload running in a few weeks.
Putting it All Together Web Portal Abstraction Y F X Z Data Repository Campus Grid
BXGrid Schema Immutable Replicas Scientific Metadata replicaid=423 state=ok replicaid=105 state=ok replicaid=293 state=creating General Metadata fileid = 24305 size = 300K type = jpg sum = abc123… replicaid=102 state=deleting
Abstractions as a Social Tool • Collaboration with outside groups is how we encounter the most interesting, challenging, and important problems, in computer science. • However, often neither side understands which details are essential or non-essential: • Can you deal with files that have upper case letters? • Oh, by the way, we have 10TB of input, is that ok? • (A little bit of an exaggeration.) • An abstraction is an excellent chalkboard tool: • Accessible to anyone with a little bit of mathematics. • Makes it easy to see what must be plugged in. • Forces out essential details: data size, execution time.
Conclusion • Grids, clouds, and clusters provide enormous computing power, but are very challenging to use effectively. • An abstraction provides a robust, scalable solution to a narrow category of problems; each requires different kinds of optimizations. • Limiting expressive power, results in systems that are usable, predictable, and reliable. • Portal + Repository + Abstraction + Grid = New Science Capabilities
Acknowledgments • Cooperative Computing Lab • http://www.cse.nd.edu/~ccl • Undergrads • Mike Kelly • Rory Carmichael • Mark Pasquier • Christopher Lyon • Jared Bulosan • Kameron Srimoungach • Rachel Witty • Ryan Jansen • Joey Rich • Faculty: • Patrick Flynn • Nitesh Chawla • Kenneth Judd • Scott Emrich • Grad Students • Chris Moretti • Hoang Bui • Li Yu • Mike Olson • Michael Albrecht • NSF Grants CCF-0621434 and CNS-0643229