360 likes | 367 Views
This paper discusses a dictionary construction technique for code compression systems with echo instructions, which helps reduce program size and improve memory requirements in embedded systems. The algorithm and experimental results are presented.
E N D
A Dictionary Construction Technique for Code Compression Systems with Echo Instructions Philip Brisk Jamie Macbeth Ani Nahapetian Majid Sarrafzadeh {philip, macbeth, ani, majid}@cs.ucla.edu Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles LCTES ’05. June 16, 2005. Chicago, IL
Outline • Introduction: Code Compression • Dictionary Compression • Dictionary Construction • Overview of the Algorithm • Experimental Methodology and Results • Summary
Introduction: Code Compression For Embedded Systems • Why Reduce Program Size? • Reduces Memory Requirements • Silicon Cost of Program Storage in on-chip ROMs • As Embedded Systems Become More Complex, Ever-More Functionality Will Migrate to Software • Costs of Runtime Decompression • Performance Overhead • Area of the Decoder Circuitry
Dictionary Compression • Find Repeated Code Sequences • Place Each Sequence Into a Dictionary • Replace Each Sequence in the Program with a Codeword that Accesses the Dictionary Dictionary Program
CALD and Echo Instructions • CALD Instructions • Place each sequence in a dictionary • All Codewords Point to the Dictionary • Echo Instructions • Leave one Instance of the Sequence Inline • All Codewords Point to the Sequence Dictionary Program Program
Compression Algorithms • The Traditional Approach: Compression Performed at Link Time • Substring Matching [Fraser et al., 1984] • + Register Renaming [Cooper and McIntosh, 1999] • [Debray et al., 2000] • + Instruction Rescheduling [De Sutter et al., 2002] • Our Approach is Somewhat Different… • Identify Repeated Isomorphic Patterns that Occur within the Intermediate Representation PRIOR TO Register Allocation [Brisk et al., 2004]
Dictionary Construction Sequence 1 Dictionary 1 DAG 1 A: R1 ← R2 + R3 B: R4 ←R5 + R6 C: R7 ←R1 + R4 A: R1 ← R2 + R3 B: R4 ←R5 + R6 C: R7 ←R1 + R4 5 A: R1 ← R2 + R3 C: R7 ←R1 + R4 DAG 2 Sequence 2 A: R1 ← R2 + R3 C: R7 ←R1 + R4 Dictionary 2 2 Schedules Exist for DAG 1 B: R4 ←R5 + R6 3 A: R1 ← R2 + R3 C: R7 ←R1 + R4 DAG 2 is isomorphic to a subgraph of DAG 1
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH)
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 SH T1
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 SH T1
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 T2 SH T1
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 T2 SH T1 T2
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 T2 SH T1 T2
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 T2 SH T1 T2
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T1 T2 SH T1 T2
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T2 T3 T1 T2 SH SH T2 T3 T1 T2 T4
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T2 T3 T1 T2 SH SH T2 T3 T1 T2
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T4 T2 T3 T1 T2 SH SH T2 T3 T1 T2 T4
Isomorphic Pattern Generation • Edge Contraction • Add an Operation to a Pattern • Combine 2 Patterns into a Larger One • Build a Subgraph Hierarchy (SH) T4 T2 T3 T1 T2 SH SH T2 T3 T1 T2 T4
An SH Grammar • The SH is also a DAG • Generate a pattern Tk from sub-patterns Ti and Tj; • Contract edge (Ti, Tj) • Create a Production: Tk→ TiTj x x T4 T2 T1 T2 T3 T2→ xT1 T4→ T3T2
Derivations and Scheduling Grammar G1 G2 G1→ G2G3 G2→ G4b G3→ G5g G4→ ac G5→ G6f G5→ G7e G6→ de G7→ df a a b b c c G3 d d G1 G1 Derivations e f e f G2 G2 G3 G3 g g b b G5 G5 G4 G4 g g a G4 G5 d G6 G7 f e ac ac c e f d d de df G7 G6 f e acbdefg acbdfeg
Compatibility Ti, Tj – patterns Si, Sj – schedules for Ti, Tj Assume Ti is a Subgraph of Tj We want Ti and Tj to Share the Same Dictionary Entry Then Si must be a Contiguous Subsequence of Sj. AC is a Contiguous Subsequence of BAC but not ABC A: R1 ← R2 + R3 B: R4 ←R5 + R6 C: R7 ←R1 + R4 B: R4 ←R5 + R6 A: R1 ← R2 + R3 C: R7 ←R1 + R4 A: R1 ← R2 + R3 C: R7 ←R1 + R4
Convex Cuts in DAGs • Let G = (V, E) be a DAG • A Cut is a Partition of V • A Convex Cut cannot have edges that cross the boundary of a cut in BOTH directions • SH Construction Ensures Convex Cuts Convex Cut / Scheduling DAG Non-Convex Cut
Convex Cuts and Compatibility G4 G2 a G1 G5 a b c G1→(2,3),(4,5) a G3 b c d b c a d e a a e f d b b c b g f g d c e f d G1→(4,5) G1→(2,3) f d g e f a a e c g b b c c f e CYCLE! g g d d e G1→ G2G3 g e f f G1→ G4G5 g
Generalized Compatibility Given a Set of Productions with G1 on the LHS… G1→ G4G5 G1→ G2G3 , … G1→ G2kG2k+1 How can we Tell if they are Compatible? • Three Criteria Equivalent to Compatibility • G1→(2,3),(4,5),…,(2k,2k+1) is Acyclic • G2 G4 … G2k • G2k+1 … G5 G3 The Pragmatic Question: If all Productions are NOT Compatible, what is the Largest Compatible Subset?
The Subset/Subgraph View of Compatibility and Scheduling Gi Si Si Sj-i Gj Gj - Gi Sj-i Gi Gj • Construct a Schedule Si for Gi • Construct a Schedule Sj-i for Gj-i • Construct a Schedule Sj = SiSj-i for Gj
A Production Compatibility Graph • Represent the Subgraph Relation as a DAG • called the Production Compatibility Graph (PCG) • Productions G1 → Gi… and G1 → Gj… create vertices Gi and Gj • Add an Edge (Gi, Gj) to the PCG if • 1. Gi Gj • 2. There is no Gk such that Gj Gk Gi • Any PATH in the PCG Corresponds to a Subset • of Patterns that can be Scheduled Contiguously • within a Dictionary entry for G1.
PCG Example G2 G4 a G1 a G5 b c a b PCG c G3 d b c d G8 d e e f e f g f g g G4 G2 G6 G10 a a G8 a b c b b c G7 G6 G9 G10 d d d c f e f e f e G11 g g g
Algorithm Overview • Recall that the Subgraph Hierarchy is a DAG • Process SH Entries in Topological Order • All Sub-Patterns Processed Before Each Pattern • Construct a PCG for each SH Entry • Assign Vertex Weights to Each Pattern based on the Number of Sub-Patterns in the Dictionary Entry • Find Max Vertex-Weighted Path in the PCG • Determine the Maximum Gain Pattern in the SH • Remove the Max Gain Pattern – and all Sub-Patterns Selected for its Dictionary Entry • Repeat until the SH is Empty
Experimental Framework • Algorithm Built into the Machine SUIF Compiler • Consolidate Each Application using link_suif Pass • All Unrolled Loops Manually Re-rolled • Standard Front End Compilation Script • One Round of Constant Folding/DCE • Instruction Selection for Alpha Architecture • ARM Back End Recently Released… • Detect Recurring Isomorphic Patterns in the IR • Analysis described in [Brisk et al., 2004] • Dictionary Construction as Described Here
Experimental Methodology • Cannot Compare with Substring Matching • Many Schedules Exist for Each DAG • Substring Matching Assumes Scheduled Code • How to Determine the Best Schedule for Each DAG? • Our Algorithm Determines a Schedule for the Entire Set of DAGs to Maximize Pattern Overlap • Naïve Approach – Each Pattern Gets Its Own Dictionary Entry • Our Approach - Isomorphism/Scheduling
Experimental Results Applications Taken from MediaBench [Lee et al., 1997]
Conclusion • Algorithm Given for Dictionary Construction • What Is Built is Actually an Intermediate Representation of a Dictionary • Combination of 3 Classically Hard Problems • Graph/Subgraph Isomorphism • Scheduling • Dictionary Construction/Compression • Future Work: Register Allocation and Assignment • Make a Best Effort to Assign Registers So that Isomorphic Patterns have Identical Register Usage
References • 1. Brisk, P., Nahapetian, A., and Sarrafzadeh, M. Instruction Selection for Compilers that Target Architectures with Echo Instructions, SCOPES 2004. • 2. Fraser, C. W., Myers, E., and Wendt, A. Analyzing and Compressing Assembly Code. Symposium on Compiler Construction, 1984. • 3. Cooper, K. D., and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors, PLDI 1999. • De Sutter, B., De Bus, B., and De Bosschere, K. Sifting out the Mud: Low-Level C++ Code Reuse, OOPSLA 2002. • Debray, S., Evans, W., Muth, R., and De Sutter, B. Compiler Techniques for Code Compaction, TOPLAS, 2000. • Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, MICRO-30, 1997.