200 likes | 397 Views
6 December, 2002. mark.baker@computer.org. Contents. Background,Observations,Suggested areas to investigate,Conclusions.. 6 December, 2002. mark.baker@computer.org. Background (late 80s). First started with transputers, here in Edinburgh:Oil reservoir simulations,Parallelised part of simultior,
E N D
1. Grid Performance Engineering Mark Baker
Distributed Systems Group
University of Portsmouth
http://dsg.port.ac.uk/
2. 6 December, 2002 mark.baker@computer.org Contents Background,
Observations,
Suggested areas to investigate,
Conclusions.
3. 6 December, 2002 mark.baker@computer.org Background (late 80s) First started with transputers, here in Edinburgh:
Oil reservoir simulations,
Parallelised part of simultior, via task farm,
OCCAM + Fortran 77,
Wanted to show the benefits of parallel systems.
Early 90s a lot of work showing that the early parallel systems gave the “right” order of benefit to justify their use – various applications – CFD, FEM, MD, graphics…
4. 6 December, 2002 mark.baker@computer.org Background (early 90s) 1993 became member of the group assigned to benchmark a set of parallel machines for a UK acquisition:
TMC-CM5, KSR2, Meiko, Cray & Intel,
30-ish codes - low-level, kernels, and full applications + system-level scripts,
Code in Fortran with PARMACS,
Vendors also used CMF and Cray-PVM versions,
Purpose of the exercise was to choose the best machine to act as a general purpose UK HPC platform.
5. 6 December, 2002 mark.baker@computer.org Background (mid 90’s Worked with Roger, Tony and others on the Genesis benchmark suite:
Set of low-level, kernels + apps
Fortran with PARMACS – later MPI.
Purpose of the codes was to provide a standardised and recognised set of codes for the evaluation of parallel systems.
6. 6 December, 2002 mark.baker@computer.org Background (late 90s) Looking at MPI on MS Windows:
Used a small number of low-level Fortran and C codes.
Aim of this work was to understand the potential of Windows clusters against UNIX ones.
Show performance hits and bottlenecks.
Started working on PEMCS – see dsg.port.ac.uk/journals/PEMCS/
Slight digression here!
7. 6 December, 2002 mark.baker@computer.org PEMCS
8. 6 December, 2002 mark.baker@computer.org Background – later 90’s Worked with Geoffrey and colleagues on mpiJava (MPJ) - a java MPI-like message passing interface:
Benchmarking aims were related to:
Compare and contrast performance of MPJ on different platforms.
Show that the “hit” of using MPJ was not so great.
9. 6 December, 2002 mark.baker@computer.org Background (early 2000s) Looking at the effects of working with widely distributed infrastructure and resources:
Stressing the Jini LUS in order to understand its capabilities and performance, wanted answers to questions like:
How many objects can be stored?
Many clients can simultaneously access the LUS?
How long does a search take?
What are the effects on system performance of the lease-renewal cycle…
Aim of recent efforts with performance testing is understanding the capabilities and configuration of components in a distributed environment.
10. 6 December, 2002 mark.baker@computer.org Some Observations Machine architectures have become increasingly complicated:
Interconnects, memory hierarchy, caching…
Greater inter-dependence of different systems components (h/w and s/w).
Performance metrics vary depending on the stakeholders viewpoint:
CPU, Disk IO, out-of-core, Graphics, Comms…
No ONE benchmark (or suite of) suits everyone – in the end it depends on the stakeholder and their application needs.
Increasing recognition as we move to a distributed infrastructure of the need to understand the individual components that it consists of!
11. 6 December, 2002 mark.baker@computer.org Some Observations Few funded efforts to understand system performance, most are unfunded and voluntary efforts, Genesis, Parkbench…
Often trying to compare apples with pears, such as comparing common operations in HPF and MPI.
Lots of knowledge and expertise in the traditional benchmarking areas, where trying to understanding single CPU, SMPs and MPPs – not suggesting this is a solved problem area though!
The increasing popularity of wide area computing, means we need to revisit what we mean by performance evaluation and modelling.
Now – a view of the Grid…
12. 6 December, 2002 mark.baker@computer.org
13. 6 December, 2002 mark.baker@computer.org Grid-based systems Assuming I want to run an application on the Grid:
I’d be fairly happy to pick up semi-standard benchmarks to look at the performance of a system at a Grid site (say a cluster or SP2).
There are a bunch of tools for looking at and analysing TCP/IP performance – mainly via ICMP.
Obviously these only show past performance!
Without real QoS, performance must be a guess!
I’ve no real idea of the performance or capabilities of the software components that make up the Grid infrastructure!
Such as agents and brokers for scheduling and caching, or communications.
14. 6 December, 2002 mark.baker@computer.org Grid-Systems We need metrics and measurements that help us understand Grid-based systems.
This will help reveal to us the factors that will affect the way we configure and use the Grid.
From a CS perspective some key areas for further investigation are:
Inter-Grid site communications.
Information Services,
Metadata processing,
Events and Security…
15. 6 December, 2002 mark.baker@computer.org Inter-Grid sitecommunications Communications performance – simple bandwidth, latency and jitter measurements between grid sites.
Maybe GridFTP tests:
Did something similar for the EuroPort project back in mid 90s.
Speed and latency change on an minute by minute basis (diurnal cycle).
Perhaps explore staging and caching!
Data can be used to predict inter-site communications capabilities.
Performance of HTTP tunnelling protocols, maybe via proxy servers.
SOAP Benchmarks, performance and processing.
16. 6 December, 2002 mark.baker@computer.org Information Services Information Service capabilities and scalability, so we can choose best system and configuration for deployment.
Produce a range of tests that can:
Compare implementation's of the same server:
Load and search small, medium and large static info sets.
Updating dynamic data!
Serving tests:
Many clients,
Max objects.
Varying access patterns,
Caching strategies,
Lots to learn from database tests here.
Compare different information servers!
UDDI, LUS, LDAP, DNS, JXTA + combinations!
17. 6 December, 2002 mark.baker@computer.org Metadata processing Data is increasingly being described using metadata languages.
Plethora of schemas and markup languages.
It appears parsing and using metadata efficiently is becoming vital.
Produce a range of tests to look at the components for using/parsing metadata:
Raw bytes/sec
Marshalled/unmarshalled size
… others…
18. 6 December, 2002 mark.baker@computer.org Events and Security Event-based systems – greater use of event-based systems.
Grid Monitoring Architecture (GMA) – publisher/subscriber.
Maybe measure various aspects of the event system architecture,
How long to subscribe?
How long to send events to multiple subscribers…
Recognise the need for lightweight and efficient events services.
Security infrastructure can have an intrusive impact on overall system performance.
Effects of:
Firewall – potential bottleneck!
SSL socket creations and handshaking,
Token processing, other aspects…
19. 6 December, 2002 mark.baker@computer.org Conclusions Need various aspects of performance evaluation and modelling for a variety of uses – depending on the stakeholders, e.g.;
Proof of concepts – algorithms, paradigms…
Compare hardware and software architectures,
Optimise applications…
In a Grid application performance has three broad areas of concern:
System capabilities,
Network,
Software infrastructure.
First two points come under traditional benchmarking, which is fairly well established and understood (CPU and network) – maybe debatable!
20. 6 December, 2002 mark.baker@computer.org Areas of future Interest To understand the performance of a Grid we could use statistical methods, gather historical data that can be used to predict performance of Grid applications:
In a gross sense this is OK, but fails to address the fact that we are using a dynamically changing infrastructure and need to incorporate new components!
From a CS perspective it is evident that we need to also understand other aspects of distributed environments, including:
Inter-Grid site communications.
Information Services,
Metadata processing,
Events and Security…
21. 6 December, 2002 mark.baker@computer.org Questions?