240 likes | 388 Views
Center for Information Services and High Performance Computing (ZIH). Random Access to Event Traces with OTF. Dagstuhl Seminar N ° 07341 August 19th - 24th 2007. OUTLINE. INTRODUCTION OTF FEATURES Openness, Flexibility, Performance OTF Architecture AUXILIARY INFORMATION
E N D
Center for Information Services and High Performance Computing (ZIH) Random Access to Event Traces with OTF Dagstuhl Seminar N°07341 August 19th - 24th 2007
OUTLINE • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
1. INTRODUCTION • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
REQUIREMENTS • Development of scalable tracing tools for HPC platforms requires: • Low-overhead trace measurement system to generate trace data AND • (Efficient trace analysis tools to process data) • Crucial factor to trace tool development is an open specification of trace information that provides: • Target for trace generation AND • (enables trace analysis and visualisation tools to operate efficiently at large scale) • The Open Trace Format (OTF) is such a trace definition and representation for the use with large-scale parallel platforms Heike Jagode
2. OTF FEATURES • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
OPENNESS, FLEXIBILITY, PERFORMANCE • Design of OTF is directed at 3 objectives: • Openness open format defines record types and file structure so that OTF traces can be generated and read correctly external wishes will be considered .. just talk to us! • Flexibility efficiently selective access is supported • Performance is determined by how efficient & fast OTF trace query and manipulation can be done parallel I/O Heike Jagode
SELECTION OF OTF FEATURES • Supports fast and selective access to large amount of performance trace data • Based on a stream model single separate units represent segments of the overall data • OTF streams may contain multiple independent processes whereas on process belongs to a single stream exclusively • Encourages parallel I/O • Strictly sequential reading of parallel traces still supported • Allows transparent ZLib compression Heike Jagode
MULTIPLE STREAMS Heike Jagode
3. AUXILIARY INFORMATION • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
AUXILIARY INFORMATION • Usually, traces are read linearly from the beginning • OTF introduces possibility to access arbitrary time stamps fast • Some auxiliary information becomes necessary • STATISTIC RECORDS • Provide an overview over an entire interval of time • Point to certain sections that are worth to access • After finding section of interest, no need in reading everything before that certain time stamp • SNAPSHOT RECORDS • Collect current state of all participating processes to make it possible to start reading at the certain time stamp Heike Jagode
4. STATISTIC RECORDS • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
STATISTIC RECORDS 1/4 • Statistic information about a monotonically increasing property p(t) for an interval [a,b) - result p([a,b)) can be computed as: p([a,b)) = p([0,b)) - p([0,a)) • Accumulate from the beginning of the trace until the current time stamp (e.g. exclusive time per function) • With n points in time t0, ..., tn-1, there are (n*(n-1))/2 possible interval results p([ti,tj)), i ≠ j of varying granularity • Quick overview over whole trace • Without reading all events (huge) • Read special statistic records only (small) Heike Jagode
STATISTIC RECORDS 2/4 Heike Jagode
STATISTIC RECORDS 3/4 • For Function Calls statistics involve: • Number of calls • Exclusive / Inclusive Time per function and function group • For Point-to-Point and Collective Communications, statistics provide summarized information for a given message type: • Process where message originated • Peer - process where message is sent to • Communicator of message summary / message type / tag • Number of sent and received messages • Number of bytes sent via messages of the given type • Number of bytes received through messages of the given type Heike Jagode
STATISTIC RECORDS 4/4 • Statistics provide summarized information about File Operations: • File identifier (or 0) • Process where file operations occurred • Number of open events / Number of close events • Number of read events / Number of write events • Number of seek events • Number of bytes read • Number of bytes written • Same applies to File Operations in a File Group Heike Jagode
5. SNAPSHOT RECORDS • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
SNAPSHOT RECORDS 1/5 • After analyzing overview information that points to a certain section of interest • Snapshot Records allow loading this section instead of entire trace • In order to start reading from a certain time stamp, the current state of ALL participating processes needs to be known • The Snapshot Recordsexplicitly store this information • In detail it means, snapshots provide: • The call stack (i.e. all active function calls) • List of pending messages, ongoing I/O activities • Current OpenMP regions, etc ... at a point in time • Based on this information, start reading event records at that very time stamp Heike Jagode
SNAPSHOT RECORDS 2/5 Heike Jagode
SNAPSHOT RECORDS 3/5 Heike Jagode
SNAPSHOT RECORDS 4/5 • For Function Calls snapshots provide information about: • a past function call at the time “original time“ • Function which has been entered • Process where action took place • Explicit source code location identifier > 0 (or 0) • Information about a past message send operation • at the time “original time“ • Sender and Receiver of the message • Process-group to which sender and receiver belong to (or 0) • Message type information > 0 (or 0) and message length • Explicit source code location identifier > 0 (or 0) Heike Jagode
SNAPSHOT RECORDS 5/5 • Provide a snapshot record for opened (and not yet closed) files • Timestamp when the file has been opened • Process identifier • Unique file open identifier Heike Jagode
6. CONCLUSION • INTRODUCTION • OTF FEATURES • Openness, Flexibility, Performance • OTF Architecture • AUXILIARY INFORMATION • STATISTIC RECORDS • What do we need it for? • SNAPSHOT RECORDS • Example of use • CONCLUSION Heike Jagode
CONCLUSION • Beside the stream model which encourages parallel I/O: • The Statistic Records give a quick overview over the whole trace • Without reading all events (huge) • Point to certain sections worth accessing • The Snapshot Records collect current state of all participating processes • To make it possible to start reading at the certain time stamp • Just load sections of interest instead of entire huge trace Heike Jagode
CONTACT DETAILS • It is still at an early stage • You are very welcome to send wishes to either: • Andreas Knüpfer he is your man • andreas.knuepfer@tu-dresden.de • Or myself • heike.jagode@tu-dresden.de Heike Jagode