David Oppenheimer

A case for resource discovery in shared distributed platforms David Oppenheimer UCB ROC Retreat12 January 2005

Introduction • Application performance is a function of • resources available to the application • resources needed by the application • or, “application sensitivity to resource constraints” • At summer retreat, described SWORD • at app deployment time, find best set of nodes given • resources available on a set of distributed nodes • application sensitivity to resource constraints • assumptions • available resources vary among nodes enough to matter • spare CPU, mem, disk space; inter-node latency, avail. bw; ... • applications are sensitive to resource constraints enough to matter • Focus of this talk: verify assumption (1)

Introduction (cont.) • Questions we will address • is there enough variation among nodes at any given (deployment) time to justify service placement? • is there enough variation over time on a single node to justify periodic task migration? • are there correlations between attributes on a single node, or among nodes at the same site? • All of these questions are important in designing a system for resource discovery and service placement (like SWORD)

Outline • How much does the available amount of per-node resources vary among nodes at a fixed time? • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

Experimental environment • Per-node attributes: Ganglia, CoMon • two-week period (Oct 10-Oct 24, 2004) • each node polled every 5 minutes • free memory, free swap, free disk, load average, network bytes sent and received/sec, # active slices • Inter-node latency: all-pairs pings • one month period ending Oct 24, 2004 • each pair of nodes measured every 15 minutes • Inter-node bandwidth: Iperf • one month period ending Oct 24, 2004 • each pair of nodes measured 1-2x/week • About 250 nodes in the trace each day

Resource heterogeneity: averages • How much does available resources vary over the trace?

Resource heterogeneity: CV vs. time

Variability of per-node attributes over time

Variability of per-node attributes over time • Can rank degree of variability of each attribute • disk, swap < mem, load < net bytes; #slices mod to sig. • CDF curve shifts to right as interval length incrs. • attributes vary less over short time periods than long • migration interval: find “sweet spot” in curve of variability vs. interval length • CDF slope decreases as median var. of attr. incr. • may be able to classify nodes as high/low var. over time for mem, load, net bytes (they have high median var.)

Inter-node latency and BW variation over time • Most nodes have low latency (and bw) variability even over a month-long trace • migration may not be worthwhile

Correlation among per-node attributes • No strong correlations between different attrs. • though some one-hour trace segments had some • Some correlation between nodes at same site

Correlation between latency and avail BW r=-.59 • Moderate inverse power law correlation • Using latency to estimate BW gives 233% error • some nodes are bandwidth-capped, some in weird ways • Some node pairs showed strong lat-BW correlation • 17% within 25%, 56% within 50%

Conclusion • How much does the available amount of per-node resources vary among nodes at a fixed time?significantly; enough to warrant svc. placement • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time?moderate variability; may warrant migration • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?no strong correlation between diff. attrs. some correlation between same attr, same site latency can predict avail. bandwidth

Future work • Ask same questions but use application model to answer, rather than analysis of raw data • different apps have different resource sensitivities • different apps have different migration costs • Can we predict attribute values? • give warning before migration • or just don’t bother to deploy on “bad” nodes • How much “better” could we do if SWORD could schedule jobs?

David Oppenheimer

David Oppenheimer

Presentation Transcript

Oppenheimer

Lecture 23 Born-Oppenheimer approximation

Robert J. Oppenheimer

Lecture 23 Born-Oppenheimer approximation

Oppenheimer Technologies

Born Oppenheimer Näherung

Chemical Reaction on the Born-Oppenheimer surface and beyond

Born-Oppenheimer Coupling Terms as Molecular Fields

David Oppenheimer Presented to CISN Steering and Advisory Committees at Caltech, 27 September 2007

David Oppenheimer Presented to CISN Steering and Advisory Committees at Caltech, 27 September 2007

Oppenheimer Technologies

J. Robert Oppenheimer

The Manhattan Project- J. Robert Oppenheimer

Oppenheimer Technologies

Wool - Part of the natural carbon cycle Martin Oppenheimer

David Oppenheimer Presented to CISN Steering and Advisory Committees at Caltech, 27 September 2007

Born-Oppenheimer Coupling Terms as Molecular Fields

David Oppenheimer Presented to CISN Steering and Advisory Committees at Caltech, 27 September 2007

Jeffrey Oppenheimer - Leading Neurosurgeon From Florida

Jeffrey Oppenheimer - Neurosurgeon with Exceptional Abilities