1 / 28

Programming Distributed Systems with High Level Abstractions

Programming Distributed Systems with High Level Abstractions. Douglas Thain University of Notre Dame 23 October 2008. Distributed Systems. Scale: 2 – 100s – 1000s – millions Domains: Single or Multi Users: 1 – 10 – 100 – 1000 – 10000 Naming: Direct, Virtual

Download Presentation

Programming Distributed Systems with High Level Abstractions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProgrammingDistributed Systemswith High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008

  2. Distributed Systems • Scale: 2 – 100s – 1000s – millions • Domains: Single or Multi • Users: 1 – 10 – 100 – 1000 – 10000 • Naming: Direct, Virtual • Scheduling: Timesharing / Space Sharing • Interface: Allocate CPU / Execute Job • Security: None / IP / PKI / KRB … • Storage: Embedded / External

  3. Cloud Computing? • Scale: 2 – 100s – 1000s – 10000s • Domains: Single or Multi • Users: 1 – 10 – 100 – 1000 – 10000 • Naming: Direct, Virtual • Scheduling: Timesharing / Spacesharing • Interface: Allocate CPU / Execute Job • Security: None / IP / PKI / KRB … • Storage: Embedded / External

  4. Grid Computing? • Scale: 2 – 100s – 1000s – 10000s • Domains: Single or Multi • Users: 1 – 10 – 100 – 1000 – 10000 • Naming: Direct, Virtual • Scheduling: Timesharing / Spacesharing • Interface: Allocate CPU / Execute Job • Security: None / IP / PKI / KRB … • Storage: Embedded / External

  5. An Assembly Languageof Distributed Computing • Fundamental Operations • TransferFile( source, destination ) • ExecuteJob( host, exe, input, output ) • AllocateVM( cpu, mem, disk, opsys ) • Semantics of Assembly are Subtle: • When do instructions commit? • Delay slots before control transfers? • What exceptions are valid for each opcode? • Precise or imprecise exceptions? • What is the cost of each instruction?

  6. Programming in Assembly Stinks • You know the problems: • Stack management. • Garbage collection. • Type checking. • Co-location of data and computation. • Query optimizations. • Function shipping or data shipping? • How many nodes should I harness?

  7. Abstractionsfor Distributed Computing • Abstraction: a declarative specification of the computation and data of a workload. • A restricted pattern, not meant to be a general purpose programming language. • Avoid the really terrible cases. • Provide users with a bright path. • Data structures instead of file systems.

  8. All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F Moretti, Bulosan, Flynn, Thain, AllPairs: An Abstraction… IPDPS 2008

  9. F F 0.97 0.05 Example Application • Goal: Design robust face comparison function.

  10. F Similarity Matrix Construction Current Workload: 4000 images 256 KB each 10s per F (five days) Future Workload: 60000 images 1MB each 1s per F (three months)

  11. http://www.cse.nd.edu/~ccl/viz

  12. Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS. F F F F F CPU CPU CPU CPU CPU F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN HN Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. Try 4: User gives up and attempts to solve an easier or smaller problem. F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN Non-Expert User Using 500 CPUs

  13. All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j A1 A2 A3 A1 A1 An AllPairs(A,B,F) B1 F F F B1 B1 Bn B2 F F F F B3 F F F

  14. What is the right metric? • Speedup? • Seq Runtime / Parallel Runtime • Parallel Efficiency? • Speedup / N CPUs? • Neither works, because the number of CPUs varies over time and between runs. • Cost Efficiency • Work Completed / Resources Consumed • Person-Miles / Gallon • Results / CPU-hours • Results / $$$

  15. All-Pairs Abstraction

  16. Classify Abstraction Classify( T, R, N, P, F ) T = testing set R = training set N = # of partitions F = classifier T1 F V1 T P T2 F V2 C V T3 F V3 R Moretti, Steinhauser, Thain, Chawla, Scaling up Classifiers to Cloud Computers, ICDM 2008.

  17. A1 A2 A3 B1 F F F B2 F F F B3 F F F BXGrid Abstractions S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) eye color S1 L brown F L blue ROC Curve S2 F R brown S3 F R brown Bui, Thomas, Kelly, Lyon, Flynn, Thain BXGrid: A Repository and Experimental Abstraction… in review 2008.

  18. Implementing Abstractions Relational Database (2x) Relational Database S = Select( color=“brown” ) DBMS Active Storage Cluster (16x) B = Transform( S,F ) Condor Pool (500x) CPU CPU CPU CPU M = AllPairs( A, B, F ) CPU CPU CPU CPU

  19. Compatibility of Abstractions? Map-Reduce Classify All-Pairs Assembly Language

  20. Compatibility of Abstractions? ??? All-Pairs Classify Mismatch:Classify partitions logically. MR partitions physically. Mismatch:MR relies on data partition. AP relies on data re-use. Map-Reduce Assembly Language

  21. Dryad Swift More General, Less Optimized? Compatibility of Abstractions? Map-Reduce Classify All-Pairs Assembly Language

  22. Swift Swift Dryad Dryad Map-Reduce Map-Reduce Classify Classify All-Pairs All-Pairs Assembly Language Assembly Language From Clouds to Multicore • Next Step: AP Implementation that runs well on Single CPU, Multicore, Cloud, or Cloud of Multicores. CPU CPU CPU CPU CPU CPU CPU CPU $$$ $$$ $$$ $$$ RAM

  23. Acknowledgments • Cooperative Computing Lab • http://www.cse.nd.edu/~ccl • Grad Students: • Chris Moretti • Hoang Bui • Michael Albrecht • Li Yu • NSF Grants CCF-0621434, CNS-0643229 • Undergraduate Students • Mike Kelly • Rory Carmichael • Mark Pasquier • Christopher Lyon • Jared Bulosan

More Related