240 likes | 345 Views
Optimizations for a Simulator Construction System Supporting Reusable Components. David A. Penry and David I. August The Liberty Architecture Research Group Princeton University. Architecture Options. Architectural Simulator. Architectural Exploration.
E N D
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group Princeton University
Architecture Options Architectural Simulator Architectural Exploration • Architectural options are studied using simulators • More iterations = better decisions • Need fast path to simulator • Need fast simulator
Architecture Description Simulator Builder Architectural Simulator Instance Simulator Construction Systems • Reuse simulator infrastructure • But still must be able to reuse descriptions • Structural composition • Medium-grained components • Standard communication contracts • High parameterizability • Separation of concerns
The Reuse Penalty • Reusability leads to a speed penalty: • more component instances • more signals • more general code • Therefore: reusable systems are often slower How can we mitigate the reuse penalty?
Data Enable Ack Liberty Simulation Environment • Simulator construction system for high reuse • Two-tiered specifications • Leaf module templates in C • Netlisting language for instantiation and customization • Three-signal standard communications contract with overrides (control functions) • Code is generated
Contrast: SystemC • Simulator construction libraries (C++) • Partially supports reuse: + Structural composition + Module granularity varies ? Communications contracts by convention - Low parameterizability - Separation of concerns • Description is a C++ program
System C uses Discrete Event (DE) LSE uses Heterogenous Synchronous Reactive (HSR) Edwards (1997) Unparsed code blocks (black boxes) Values begin unresolved and resolve monotonically Chaotic scheduling A C A A A C C C A B B B B B B B A C C A C D D D D D D D Models of Computation
B A C D Potential HSR Benefits vs. DE • Static schedules possible • Lower per-signal overhead • Use of unresolved value to avoid redundant computation
Experimental methodology • Three models of a 4-way out-of-order microprocessor • SystemC using custom speed-optimized components • LSE model using custom speed-optimized components • LSE model using standard reusable components • 9 benchmarks (CPU 2000/MediaBench) • See paper for compiler, etc. Non-edge signals Model Signals Instances Custom SystemC 4 71 32 Custom LSE 3 138 48 Reusable LSE 11 489 423
Custom LSE vs. SystemC • Custom LSE outperforms custom SystemC • Reduction in overhead • Use of unresolved signal value • Static instantiation and code specialization • Dynamic schedule for both
Reuse Penalty • Reusable model suffers large reuse penalty (0.26) • Many more signals • Many more non-edge signals • More components • All dynamic schedules
A C D B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 b 4 a c Schedule: a b c
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 b 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 2 3 2 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 B 4 A C Schedule: 1 2 3 2 4 A B C B (D) Choosing an optimal partition is exponential
A B C Dynamic sub-schedule embedding SCCs arise due to incomplete information • “Optimal” schedules are optimal w.r.t. information • “Optimal” schedule may be worse than dynamic When an SCC is “too big”, just schedule that section dynamically
A B C Dependency information enchancement • In practice, we see big SCCs • Peek in the black box • Simple parsing of communication overrides (control functions) • Can ask user to tell about internal dependencies • Not too painful because it is reused
Evaluation of Information Enhancement • Control function parsing more useful alone • Not principally through scheduling • It is important to have both kinds of enhancement
Reuse Penalty Revisited • Reuse penalty mitigated in part Reusable LSE model 6% faster than custom SystemC
Conclusions • A tradeoff exists between speed and reuse • The simulator construction system can help • Higher base speed makes reuse penalty less painful • Optimizations are possible with HSR model • Ability of scheduler adapt to information available is powerful • This adaptation is not possible with DE • You can have high reuse at reasonable speeds
Future Work • Release of LSE • Fall 2003 • http://liberty.princeton.edu • Hybrid model of computation • Embed HSR in DE, DE in HSR • Automatic extraction of HSR portions from DE
Other optimizations • Improved block coalescing • See paper • Code specialization • Implementation of APIs depends upon environment