1 / 40

Using Small Abstractions to Program Large Distributed Systems

Using Small Abstractions to Program Large Distributed Systems. Douglas Thain University of Notre Dame 11 December 2008. Using Small Abstractions to Program Large Distributed Systems. (And multicore computers!). Douglas Thain University of Notre Dame 11 December 2008.

alma
Download Presentation

Using Small Abstractions to Program Large Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Small Abstractionsto ProgramLarge Distributed Systems Douglas Thain University of Notre Dame 11 December 2008

  2. Using Small Abstractionsto ProgramLarge Distributed Systems (And multicore computers!) Douglas Thain University of Notre Dame 11 December 2008

  3. Clusters, clouds, and gridsgive us access to zillions of CPUs. How do we write programs that canrun effectively in large systems?

  4. I have 10,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers. I own a few machines I can buy time from Amazon or TeraGrid. I have a laptop. Now What?

  5. How do a I program a CPU? • I write the algorithm in a language that I find convenient: C, Fortran, Python, etc… • The compiler chooses instructions for the CPU, even if I don’t know assembly. • The operating system allocates memory, moves data between disk and memory, and manages the cache. • To move to a different CPU, recompile or use a VM, but don’t change the program.

  6. How do I program the grid/cloud? • Split the workload into pieces. • How much work to put in a single job? • Decide how to move data. • Demand paging, streaming, file transfer? • Express the problem in a workflow language or programming environment. • DAG / MPI / Pegasus / Taverna / Swift ? • Babysit the problem as it runs. • Worry about disk / network / failures…

  7. How do I program on 128 cores? • Split the workload into pieces. • How much work to put in a single thread? • Decide how to move data. • Shared memory, message passing, streaming? • Express the problem in a workflow language or programming environment. • OpenMP, MPI, PThreads, Cilk, … • Babysit the problem as it runs. • Implement application level checkpoints.

  8. Tomorrow’s distributed systems will be clouds of multicore computers. So, we have to solve both problemswith a single model.

  9. Observation • In a given field of study, a single person may repeat the same pattern of work many times, making slight changes to the data and algorithms. • Examples everyone knows: • Parameter sweep on a simulation code. • BLAST search across multiple databases. • Are there other examples?.

  10. Abstractionsfor Distributed Computing • Abstraction: a declarative specification of the computation and data of a workload. • A restricted pattern, not meant to be a general purpose programming language. • Usesdata structures instead of files. • Provide users with a bright path. • Regular structure makes it tractable to model and predict performance.

  11. All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 allpairs A B F.exe An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F Moretti, Bulosan, Flynn, Thain, AllPairs: An Abstraction… IPDPS 2008

  12. F F 0.97 0.05 Example Application • Goal: Design robust face comparison function.

  13. F Similarity Matrix Construction Current Workload: 4000 images 256 KB each 10s per F (five days) Future Workload: 60000 images 1MB each 1s per F (three months)

  14. http://www.cse.nd.edu/~ccl/viz

  15. Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS. F F F F F CPU CPU CPU CPU CPU F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN HN Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. Try 4: User gives up and attempts to solve an easier or smaller problem. F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN Non-Expert User Using 500 CPUs

  16. All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F

  17. How Many CPUs on the Cloud?

  18. What is the right metric?

  19. How to measure in clouds? • Speedup? • Seq Runtime / Parallel Runtime • Parallel Efficiency? • Speedup / N CPUs? • Neither works, because the number of CPUs varies over time and between runs. • An Alternative: Cost Efficiency • Work Completed / Resources Consumed • Person-Miles / Gallon • Results / CPU-hours • Results / $$$

  20. All-Pairs Abstraction

  21. x F d y Wavefront ( R[x,0], R[0,y], F(x,y,d) ) R[0,4] R[2,4] R[3,4] R[4,4] x F d y R[0,3] R[3,2] R[4,3] x x F F d y d y R[0,2] R[4,2] x F x F d y d y R[0,1] x F x F x F x F d y d y d y d y R[0,0] R[1,0] R[2,0] R[3,0] R[4,0]

  22. Implementing Wavefront Input Multicore Master F F Complete Input Output F F Distributed Master Input Complete Output Multicore Master F F Output F F

  23. The Performance Problem • Dispatch latency really matters: a delay in one holds up all of its children. • If we dispatch larger sub-problems: • Concurrency on each node increases. • Distributed concurrency decreases. • If we dispatch smaller sub-problems: • Concurrency on each node decreases. • Spend more time waiting for jobs to be dispatched. • So, model the system to choose the block size. • And, build a fast-dispatch execution system.

  24. Model of 1000x1000 Wavefront

  25. 500x500 Wavefront on ~200 CPUs

  26. Wavefront on a 200-CPU Grid

  27. Wavefront on a 32-Core CPU

  28. Classify Abstraction Classify( T, R, N, P, F ) T = testing set R = training set N = # of partitions F = classifier T1 F V1 T P T2 F V2 C V T3 F V3 R Moretti, Steinhauser, Thain, Chawla, Scaling up Classifiers to Cloud Computers, ICDM 2008.

  29. From Abstractionsto a Distributed Language

  30. What Other AbstractionsMight Be Useful? • Map( set S, F(s) ) • Explore( F(x), x: [a….b] ) • Minimize( F(x), delta ) • Minimax( state s, A(s), B(s) ) • Search( state s, F(s), IsTerminal(s) ) • Query( properties ) -> set of objects • FluidFlow( V[x,y,z], F(v), delta )

  31. How do we connect multiple abstractions together? • Need a meta-language, perhaps with its own atomic operations for simple tasks: • Need to manage (possibly large) intermediate storage between operations. • Need to handle data type conversions between almost-compatible components. • Need type reporting and error checking to avoid expensive errors. • If abstractions are feasible to model, then it may be feasible to model entire programs.

  32. A1 A2 A3 B1 F F F B2 F F F B3 F F F Connecting Abstractions in BXGrid S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) eye color S1 L brown F L blue ROC Curve S2 F R brown S3 F R brown Bui, Thomas, Kelly, Lyon, Flynn, Thain BXGrid: A Repository and Experimental Abstraction… poster at IEEE eScience 2008

  33. Implementing Abstractions Relational Database (2x) Relational Database S = Select( color=“brown” ) DBMS Active Storage Cluster (16x) B = Transform( S,F ) Condor Pool (500x) CPU CPU CPU CPU M = AllPairs( A, B, F ) CPU CPU CPU CPU

  34. What is the Most Useful ABI? • Functions can come in many forms: • Unix Program • C function in source form • Java function in binary form • Datasets come in many forms: • Set: list of files, delimited single file, or database query. • Matrix: sparse elem list, or binary layout • Our current implementations require a particular form. With a carefully stated ABI, abstractions could work with many different user communities.

  35. What is the type system? • Files have an obvious technical type: • JPG, BMP, TXT, PTS, … • But they also have a logical type: • JPG: Face, Iris, Fingerprint etc. • (This comes out of the BXGrid repository.) • The meta-language can easily perform automatic conversions between technical types, and between some logical types: • JPG/Face -> BMP/Face via ImageMagick • JPG/Iris -> BIN/IrisCode via ComputeIrisCode • JPG/Face -> JPG/Iris is not allowed.

  36. Abstractions Redux • Mapping general-purpose programs to arbitrary distributed/multicore systems is algorithmically complex and full of pitfalls. • But, mapping a single abstraction is a tractable problem that can be optimized, modeled, and re-used. • Can we combine multiple abstractions together to achieve both expressive power and tractable performance?

  37. Acknowledgments • Cooperative Computing Lab • http://www.cse.nd.edu/~ccl • Faculty: • Patrick Flynn • Nitesh Chawla • Kenneth Judd • Scott Emrich • Grad Students • Chris Moretti • Hoang Bui • Karsten Steinhauser • Li Yu • Michael Albrecht • Undergrads • Mike Kelly • Rory Carmichael • Mark Pasquier • Christopher Lyon • Jared Bulosan • NSF Grants CCF-0621434, CNS-0643229

More Related