VO-Ganglia Grid Simulator

VO-Ganglia Grid Simulator Catalin Dumitrescu, Mike Wilde, Ian Foster Computer Science Department The University of Chicago

Talk Overview • Part I: The Grid-enabled Monitoring Tool • Part II: From Monitoring to Simulation • Part III: Features / Extended Model • Shortcomings • Future Work / Conclusions 2

VO-Ganglia / Grid-enabled Mon • P2P Reporting • implicit hierarchic infrastructures • Interface with Other Monitoring Tools • Nagios, MDS 2 • Grid/Globus Specific Metrics • Gatekeeper Information / Cluster RM Status • Per VO Monitoring Support • Collected metrics were aggregated and VO specific as well • Resource Management • Preference Specifications • Usage Policy Enforcement 3

Best Snapshot (1) 4

Best Snapshot (2) 5

Why to Continue on this Path? • Implemented Ideas • VO based Metric Reporting • Usage Policy Metric Incorporation • Distributed Infrastructure for Usage Policy • Time Spent with Development • Enhanced Monitoring ~ 3 month • Policy ~ 6 months • Simulator ~ 3 months • Are Other Alternatives Around? • MonaLisa • Standard Ganglia 6

From Monitoring to Simulation • Difficult to Find Always Acceptable Grid Testbeds • Deployment Takes Time • Computing Time Represents an Issue in Production Environments • What Do Some Well Known TestBeds offer Today? • Grid3: many clusters with similar software AND Globus • PlanetLab: individual machines with similar characteristics 7

Features / Implemented Model • CPU Management / Task Assignment Policies • Disk Management / Space Assignment Policies • Network Management / Maximum Capacity (so far) • Usage Policy Specification Interface • Data File Management (replica selection problem) 8

Implementation Details • Before: • Metric collection by means of specific collectors • Now: • Special modules that generate metrics about different loads • Similar to a discrete simulator but integrated with a real tool • “How exactly?” • Periodic invocations (instead of monitoring collectors) • State management for workloads, data file migration, CPU and disk allocations, network usages 9

Running Examples 10

Talk Overview • Part I: The Grid-enabled Monitoring Tool • Part II: From Monitoring to Simulation • Part III: Features / Extended Model • Shortcomings • Future Work / Conclusions 11

Distributed Simulations • Idea: Is it possible to run several simulators on different machines and configure each instance to report to a set of specified neighbors? • Advantages: • Simplicity in connecting several local simulators working on different data • Support for metric distribution and visualization 12

Running Examples [...] 13

Commitment Usage Policy # Case 3: fill EPi (resource contention) else if (sum(BAk) == TOTAL) & (BAi < EPi) & (Qi exists) then if (j exists such that BAj >= EPj) then stop scheduling jobs for VOj # Need to fill with extra jobs? if (BAi < EPi + BEi) then schedule a job from some Qi to the least loaded site # ?? if (EAi < EPi) & (Qi has jobs) then schedule additional backfill jobs for each Gi with EPi, BPi, BEi do # Case 1: fill BPi + BEi if (Sum(BAj) == 0) & (BAi < BPi) & (Qi has jobs) then schedule a job from some Qi to the least loaded site # Case2: BAi<BPi (resources available) else if (SUM (BAk) < TOTAL) & (BAi < BPi) & (Qi has jobs) schedule a job from some Qi to the least loaded site 14

Usage Policy Example 15 99% VO2 90% 80% 60% VO1 20%

Commitment Policy in Practice 16

Current Issues • RRD / Disk Access • Perl / Interpreted Language Speed • Result Interpretation • Result Validation in Real Contexts 17

Future Work • “What Is Next? ” • More work Resource Usage Policy Analsys • “Export” ideas from VO-Ganglia in real pratice 18

Conclusions • “Why VO-Ganglia Is So 'Cool‘ for me?” • Some creative ideas • Easy to use • “Possibility to run on my laptop” • Provisioning tools for • Workload generation • Result formatting • “Why Did I Invest More Than a Year in Developing It?” 19

Questions / Suggestions? ? 20

VO-Ganglia Grid Simulator