220 likes | 364 Views
Instruction Generation for Hybrid Reconfigurable Systems. Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh and Majid Sarrafzadeh {kastner,seda,elib,majid}@cs.ucla.edu. Embedded and Reconfigurable Systems Group Computer Science Department UCLA Los Angeles, CA 90095. Outline.
E N D
Instruction Generation for Hybrid Reconfigurable Systems Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh and Majid Sarrafzadeh {kastner,seda,elib,majid}@cs.ucla.edu Embedded and Reconfigurable Systems Group Computer Science Department UCLA Los Angeles, CA 90095
Outline • Introduction • Programmability • Hybrid Reconfigurable Systems • Strategically Programmable System • Instruction Generation • Uses in Hybrid Reconfigurable Systems • Relation to Template Generation and Matching • Algorithm for Template Generation and Matching • Experiments • Conclusion
Programmability • Future systems need programmability multiple levels of computation hierarchy Computational Hierarchy: Control Control ADD Register FU FU Memory Register Bank MUL Register -Architecture Level Architecture Level Gate Level Hybrid Reconfigurable Systems have programmability at one or more levels
Tradeoffs Configuration Time Flexibility Thousands of cycles Hundreds of cycles Tensilica, Improv Chameleon Systems Xilinx, Altera Control Control FU FU Memory ADD Register Register Bank MUL Register Gate level Micro-architecture level Architecture level Types of Programmable Units CLBs, LUTs Datapath unit, Control unit, RAM Custom instructions, Register banks Example Platform Hybrid Reconfigurable Systems should find a happy medium
SPS - Strategically Programmable System Memory VPB VPB Memory VPB • Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric • Combine programmable units from gate, microarchitecture and architecture levels • Balance flexibility and configuration time • Need automated method of determining the functionality of VPBs
Overview of SPS SPS Compiler Set of applications specified in high level code (c/c++, fortran, MOC) • Compile to low • level specification • Determine VPB • functionality SPS Architecture Generation SPS Architecture SPS Module Placement VPB Synthesis Routing Arch.
VPB Instruction Generation Set of applications • Given a set of applications, what computation should be implemented on VPBs? RAM VPB VPBs? RAM VPB • Want complex, commonly occurring computation patterns • Look for computational patterns at the instruction level • Basic operation is add, multiply, shift, etc.
Problem Definition • Determining VPB functionality requires regularity extraction • Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs • Each application can be specified by collection of graphs (CDFGs) • Templates are implemented as VPBs • Two related sub-problems: • Template Matching • Template Generation
Template Matching – Formal Def’n Directed Labeled Graph G % + + * * + * + & % * + + * + + + + * * || * & || & + + * * + * * * & + + + • Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V,E), find every subgraph of G that is isomorphic to any Ti Templates T T1 T2 T3 T6 T5 T4
Template Matching – Formal Def’n • Problem 2: Given an infinite number of each set of templates = T1, … , Tk and an overlapping set of subgraphs of the given graph G(N,E) which are isomorphic to some member of ; minimize k as well as xi where xiis the number of templates of type Tiusedsuch that the number of nodes left uncovered is the minimum. % + + * & % * + + + + * * & || * * + + +
Template Generation • Templates may not always be given as input • An automatic regularity extraction algorithm must develop it’s own templates • Generate a set of templates such that: • Number of templates is minimized • Covering of the graph is maximized
Related Work • Useful in a wide variety of CAD applications • Data path regularity • [Chowdhary98], [Callahan99] • Scheduling [Ly95] • System partitioning [Rao93] • Low power design [Mehra96] • Soft macros – CPR [Cadambi99] for PipeRench architecture
An Algorithm for Simultaneous Template Generation and Matching Formal Definition Informal Definition • Given a labeled digraph G(V, E) • # C is a set of edge types • C • while (stop_conditions_not_met(G)) • C profile_graph(G) • cluster_common_edges(G, C) • Find the most common edge type • Contract common edges • Repeat until stopping condition met
Explanation of Algorithm • Profile Edges: Find most common edge types * + * Most Common Edge Type * * * * • Edge contraction: Merge adjacent nodes and maintain connectivity + * + Contract Edge * * * * * * * • Stopping Conditions • Reach certain number of templates • Graph sufficiently covered • No frequently occurring edge type
Algorithm in Action >> % & Conflict Graph >> % + & + Create Conflict Graph Determine MIS Contract edges 2 and 4 MIS Edge 3 * * * Edge 4 * * * * Edge 2 * Edge 1 Edge 4 Edge 1 Edge 3 Edge 2 * * * * Templates >> % >> % & & + + * * * * * * * * Contract edges Iteration 2 * * * * Templates
Algorithm Summary • Algorithm can be generalized and used in a variety of applications • Easily extended to hypergraphs • Input/output pin restrictions can easily be added • Performs template generation and matching simultaneously We target algorithm towards VPB generation in SPS
Experimental Setup Control Dataflow Graph + * + + * Control Flow Graph Set of applications specified in C SUIF & Machine-SUIF Dataflow Graph Generation Pass
Experimental Setup MediaBench Files Control Dataflow Graph + * + + * Compile to CDFGs Perform Template Generation and Matching Gather Statistics: Graph Coverage, Num. Templates
Benchmark C File Description Experimental Setup - Benchmarks mpeg2 motion.c Motion vector decoding mpeg2 getblk.c DCT block decoding adpcm adpcm.c ADPCM to/from 16-bit PCM epic convolve.c 2D general image convolution jpeg jctrans.c Transcoding compression jpeg jdmerge.c Color conversion rasta fft.c Fast Fourier Transform rasta noise_est.c Noise estimation functions gsm gsm_decode.c GSM decoding gsm gsm_encode.c GSM encoding • Selected files from MediaBench
Oper-ation MediaBench file name Similarity Across Applications motion jdmerge getblk gsm_dec jctrans ADD 50.3% 84.6% 44.5% 29.6% 84.6% MUL 36.3% 13.8% 24.0% 22.4% 13. 8% Template Coverage MUL- MUL 0.0% 0.0% 1.3% 0.0% 0.0% ADD-ADD 14.5% 9.1% 3.2% 3.6% 9.1% ADD-MUL 0.0% 0.4% 0.6% 0.0% 0.4% MUL-ADD 36.3% 13.0% 21.5% 22.4% 13.0%
Experimental Results • Techniques • Simple – restrict templates to two operations • No restrictions – unlimited amount of operations • Stopping condition: most common edge occurs < x% (x5-25)
Summary • Systems need programmability at multiple levels of the computational hierarchy • Introduced SPS as a Hybrid Reconfigurable System • Developed an instruction generation algorithm to determine VPB functionality • Showed that common templates can be found across a similar set of applications • An efficient covering possible using simple templates • Future work: Create methods to uncover more complex templates