180 likes | 402 Views
Geo-Distributed Cloud Computing. Motivation. Data intensive multi-sensor computations. Bandwidth trumps computing. Need Data Reductions Many Emerging Problems. Proposal. Cloud to distribute data geographically Map Reduce paradigm as a programming model
E N D
Motivation • Data intensive multi-sensor computations. • Bandwidth trumps computing. • Need Data Reductions • Many Emerging Problems
Proposal • Cloud to distribute data geographically • Map Reduce paradigm as a programming model • Exhibits properties that scale well over long distances • Provide heterogeneous multicore processing environments • Programming strategies for geo-distributed software • Ad-hoc Membership
Strategy 1 • Single Global Sun Grid Engine installation • Single Global Hadoop Distributed Filesystem installation • Jobs submitted to SGE are distributed among nodes provided by various institutions • SSH login
Institution A Institution B Institution C Compute Nodes Compute Nodes Compute Nodes SGE Master Layout
(Dis)Advantages • Advantages • Software already exists • Simple installation • Disadvantages • Requires central authority • Might not maximize data locality
Strategy 2 • Separate installations at each institution • Controller connects installations • Initiates separate Map/Reduce for spatially local data • Data transferred from one system as needed
Institution A Institution B Institution C Compute Nodes Compute Nodes Compute Nodes Cloud Service Cloud Service Cloud Service Layout
Advantages • Institutions in full control over own nodes • Independent Map/Reduce operations at each institution • Reduced bandwidth between sites • Controller provides simpler programming
Conclusions • Isolated compute resources for local cloud machinery • UCSD 8 QS20 • UMBC 8 JS20 • Begun installation of SGE and Hadoop • By mid-June, cloud with SGE and Hadoop operational on isolated resources