Information and Scheduling: What's available and how does it change

Information and Scheduling: What's available and how does it change Jennifer M. Schopf Argonne National Lab

Information and Scheduling • How a scheduler work is closely tied to the information available • Choice of algorithm dependent on accessible data

This Talk • What approaches expect form information • What data is actually available, and some open questions • How data changes • What to do about changing data

NB • I’m speaking (pessimistically) from my own background • We’ve heard some talks earlier today (for example PACE) which address some of these problems • I still think these are interesting open issues to think about

Information systems(NOTE: taken from my standard MDS2 talk) • Information is always old • Time of flight, changing system state • Need to provide quality metrics • Distributed system state is hard to obtain • Information is not contemporaneous (thanks j.g.) • Complexity of global snapshot • Components will fail • Scalability and overhead • Approaches are changed for scalability, this will affect the information available

Scheduling approaches assume • A lot of data is available • All information is accurate • Values don’t change

What some people expect • Perfect bandwidth info • Number of operations in an application • Scalar value of computer “power” • Mapping of “power” to applications • Perfect load information

Bandwidth data • Network Weather Service (Wolski, UCSB) • 64k probe BW data • Latency data • Predictions • Pinger (Les Cotrell, SLAC) • Create long term baselines for expectations on means/medians and variability for response time, throughput, packet loss • Predicting TCP performance • Allen Downey • http://allendowney.com/research/tcp/ • But what do Grid applications need?

LBL-ANL GridFTP (approximately 400 transfers at irregular intervals) end-to-end bandwidth and NWS (approximately 1,500 probes every five minutes) probe bandwidth for the two-week August’01 dataset. Perfect Bandwidth Data 64 k probes don’t look like large file transfers

Predicting Large File Transfers • Vazhkudai and Schopf: use GridFTP logs and some background data - NWS, ioStat (HPDC 2002) • Error rate of ~15% • M. Faerman A. Su, R. Wolski,andF. Berman (HPDC 99) • Similar results for SARA data • Hu and Schopf: use an AI learning technique on GridFTP log files only (not published yet) • Picks best place to get a file from 60-80% of time, using averages only gives you ~50% “best chosen” • This topic needs much more study!

Data GenerallyAvailable From an Application • What some scheduling approaches want: • Number of ops in an application • Exact execution time on a platform • Perfect models of applications

Application DataCurrently Available • Bad models of applications • No models of applications • Some work (Propehsy, Taylor at Texas A&M) does logging to create models • Many interesting applications have non-deterministic run times • User estimates of application run time (historically) off by 20%+ • We need to be able to figure out ways to do predictions of application run times WITHOUT models

Scalar value of computer “power” • MDS2 gives me: • CPU vendor, model and version • CPU speed • OS name, release and version • RAM size • Node count • CPU count • Where is “compute power” in this data?

What is compute “power” • I could get benchmark data, but what’s the right benchmark(s) to use? • Computer “power” simply isn’t scalar, especially in a Grid environment • Goal is really to understand how an application will run on a machine Given three different benchmarks, 3 different platforms will perform very differently – one best on BM1, another best on BM2

Mapping “power” to applications • Many scheduling approaches assume “power” is a scalar – just multiply it by the set application time and we’re set • Only problem: • Power isn’t a scalar • No one knows absolute application run times • Mapping will NOT be straight forward • We need a way to estimate application time on a contended system

Perfect Load Information • MDS2 gives me: • Basic queue data • Host load 5/10/15 min avg • Last value only

Load Predictions • Network weather service • 12+ prediction techniques • Work on any time series • Expect regularly arriving data • Only a prediction of the next value • *I* want to know what load is going to be like in 20 mins • Or the AVERAGE over the next 20 mins?

Information and Scheduling • What approaches expect us to have • What we actually have access to • How it changes • What to do about changing data

Dedicated SOR Experiments • Platform- 2 Sparc 2’s. 1 Sparc 5, 1 Sparc 10 • 10 mbit ethernet connection • Quiescent machines and network • Prediction within 3% before memory spill

Non-dedicated SOR results • Available CPU on workstations varied from .43 to .53

SOR with Higher Variancein CPU Availability

Improving predictions • Available CPU has range of 0.48 +/- 0.05 • Prediction should also have a range

Scheduling needsto consider variance • Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments • Lingyun Yang, Jennifer M. Schopf, Ian Foster • To appear at SC'03, November 15-21, 2003, Phoenix, Arizona, USA • www.mcs.anl.gov/~jms/Pubs/lingyun-SC-scheduling.pdf

Scheduling with Variance • Summary: Scheduling with variance can give better mean performance and less variance in overall execution time

Lessons: • We need work predicting large file transfers – NOT bandwidth • We need to be able to figure out ways to do predictions of application run times WITHOUT models • We need predictions over time periods – not just a next value • We need a way to represent “power” of a machine, that takes variance into account • We need a way to map power to application behavior • We need better scheduling approaches that take variance into account

Contact Information • Jennifer M. Schopf • jms@mcs.anl.gov • www.mcs.anl.gov/~jms • Links to some of the publications mentioned • Links to the co-edited book “Grid resource Management: State of the Art and Future Trends”

Information and Scheduling: What's available and how does it change

Information and Scheduling: What's available and how does it change

Presentation Transcript

Advocacy What is it and how does it work

GRACE WHAT IS IT AND HOW DOES IT WORK?

CONSOLIDATION – What is it and how does it work?

What is Censorship and How Does it Work?

CONSOLIDATION – What is it and how does it work?

The Research Information Network: what is it and what does it do?

Climate change: What is happening and what does it mean?

Trade and Climate Change - How does it Affect Namibia?

Avogadro’s Constant What is it and what information does it give us?

S EED – What it does

How Does Air Change When It Rises and Sinks?

What is Trimassix and how does it work?

What is Swagboard and How Does it Works?

What Is AdMob and how does it works?

WHAT IS AA AND HOW DOES IT WORK?

CONSOLIDATION – What is it and how does it work?

What is Spectroscopy and How Does it Work?

What Does It Take To Change?

Information and Scheduling: What's available and how does it change

What Does ESG Funds Mean and How Does It Work?

What Is Citrix And How Does It Work?

What is OnlyFans and how does it work?