Optimizations for a Simulator Construction System Supporting Reusable Components

Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group Princeton University

Architecture Options Architectural Simulator Architectural Exploration • Architectural options are studied using simulators • More iterations = better decisions • Need fast path to simulator • Need fast simulator

Architecture Description Simulator Builder Architectural Simulator Instance Simulator Construction Systems • Reuse simulator infrastructure • But still must be able to reuse descriptions • Structural composition • Medium-grained components • Standard communication contracts • High parameterizability • Separation of concerns

The Reuse Penalty • Reusability leads to a speed penalty: • more component instances • more signals • more general code • Therefore: reusable systems are often slower How can we mitigate the reuse penalty?

Data Enable Ack Liberty Simulation Environment • Simulator construction system for high reuse • Two-tiered specifications • Leaf module templates in C • Netlisting language for instantiation and customization • Three-signal standard communications contract with overrides (control functions) • Code is generated

Contrast: SystemC • Simulator construction libraries (C++) • Partially supports reuse: + Structural composition + Module granularity varies ? Communications contracts by convention - Low parameterizability - Separation of concerns • Description is a C++ program

System C uses Discrete Event (DE) LSE uses Heterogenous Synchronous Reactive (HSR) Edwards (1997) Unparsed code blocks (black boxes) Values begin unresolved and resolve monotonically Chaotic scheduling A C A A A C C C A B B B B B B B A C C A C D D D D D D D Models of Computation

B A C D Potential HSR Benefits vs. DE • Static schedules possible • Lower per-signal overhead • Use of unresolved value to avoid redundant computation

Experimental methodology • Three models of a 4-way out-of-order microprocessor • SystemC using custom speed-optimized components • LSE model using custom speed-optimized components • LSE model using standard reusable components • 9 benchmarks (CPU 2000/MediaBench) • See paper for compiler, etc. Non-edge signals Model Signals Instances Custom SystemC 4 71 32 Custom LSE 3 138 48 Reusable LSE 11 489 423

Custom LSE vs. SystemC • Custom LSE outperforms custom SystemC • Reduction in overhead • Use of unresolved signal value • Static instantiation and code specialization • Dynamic schedule for both

Reuse Penalty • Reusable model suffers large reuse penalty (0.26) • Many more signals • Many more non-edge signals • More components • All dynamic schedules

A C D B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce

1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 4

1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 b 4 a c Schedule: a b c

1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 b 4

1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 2 3 2 4

1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 B 4 A C Schedule: 1 2 3 2 4 A B C B (D) Choosing an optimal partition is exponential

A B C Dynamic sub-schedule embedding SCCs arise due to incomplete information • “Optimal” schedules are optimal w.r.t. information • “Optimal” schedule may be worse than dynamic When an SCC is “too big”, just schedule that section dynamically

A B C Dependency information enchancement • In practice, we see big SCCs • Peek in the black box • Simple parsing of communication overrides (control functions) • Can ask user to tell about internal dependencies • Not too painful because it is reused

Evaluation of Information Enhancement • Control function parsing more useful alone • Not principally through scheduling • It is important to have both kinds of enhancement

Reuse Penalty Revisited • Reuse penalty mitigated in part Reusable LSE model 6% faster than custom SystemC

Conclusions • A tradeoff exists between speed and reuse • The simulator construction system can help • Higher base speed makes reuse penalty less painful • Optimizations are possible with HSR model • Ability of scheduler adapt to information available is powerful • This adaptation is not possible with DE • You can have high reuse at reasonable speeds

Future Work • Release of LSE • Fall 2003 • http://liberty.princeton.edu • Hybrid model of computation • Embed HSR in DE, DE in HSR • Automatic extraction of HSR portions from DE

Other optimizations • Improved block coalescing • See paper • Code specialization • Implementation of APIs depends upon environment

Optimizations for a Simulator Construction System Supporting Reusable Components

Optimizations for a Simulator Construction System Supporting Reusable Components

Presentation Transcript

A/C SYSTEM COMPONENTS

Reusable Components for Grid Computing Portals

Supporting Construction Millwrights

Components for a semantic textual similarity system

Flex Reusable Components

System Components

A/C SYSTEM COMPONENTS

Software Engineering with Reusable Components

A PMU Simulator for Power System Education Use

A Framework, Methodology and Tool for Reusable Software Components*

Reusable Components –

System components

Components of a system

Replication System Simulator

composable, reusable model components

A Framework, Methodology and Tool for Reusable Software Components

Software Engineering with Reusable Components

Software Engineering with Reusable Components

A/C SYSTEM COMPONENTS

Software Engineering with Reusable Components

Software Engineering with Reusable Components

A PMU Simulator for Power System Education Use