190 likes | 317 Views
I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics. Jonghee W. Yoon, Jongeun Lee*, Jaewan Jung, Sanghyun Park, Yongjoo Kim , Yunheung Paek and Doosan Cho** Seoul National University, Korea *UNIST, Korea
E N D
I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics Jonghee W. Yoon, Jongeun Lee*, Jaewan Jung, Sanghyun Park, Yongjoo Kim, Yunheung Paek and Doosan Cho** Seoul National University, Korea *UNIST, Korea **Sunchon National University, Korea
Outline • CGRA & Augmentation • Overall Design Flow • Our Approach (I2CRF) • Problem definition(Inexact graph matching) • Mapping with A* search • Experiment • Conclusion
Reconfigurable Architecture • Reconfiguration is emerging • increasing needs for flexible and high speed computing fabrics • CGRAs (Coarse-Grained Reconfigurable Architectures) • operation level granularity • high performance • S/W development is easy MorphoSys ADRES
Augmentation • General CGRA - Mapping • CGRA Arch. + Applications Configurations • Application specific CGRAs - Synthesis • Applications New Arch. + Configurations • Augmentation • Base CGRA + Applications New Arch.+Configurations • Customizable Features • The number of PEs • The set of PE operation • Heterogeneity or Homogeneity • Memory subsystem architectures • Interconnection network 14% (130nm) 30%(45nm) Interconnect Exploration for Energy Versus Performance Tradeoffs for Coarse Grained Reconfigurable Architectures, TVLSI 2009 Energy consumption
Overall design flow - I2CRF Kernel I2CRF (Incremental Interconnect Customization for Reconfigurable Fabrics ) Base CGRA Vertex Clustering Mapping (A* Search for Minimum-Cost Edit Path) Arch Extension + (Accum.) Interconnections Application-Specific Reconfigurable Architecture Not Satisfied Evaluation
I2CRF • Incremental architecture change by adding interconnections to the base architecture • Strengths • Regularity is maintained through the base architecture • But provides specialization for the target applications • Fast specialization and no limitation for design space • The architecture change occurs while kernel is mapped.
The difference Compared with general mapping • Existing application mapping for CGRA • Find a graph XC that is isomorphic to K • Augmentation and Mapping • Find the a graph Y that is isomorphic to K and a subset of C` which is most similar to C 2 1 3 1 4 × 2 3 5 6 General Mapping 4 5 PE 1 PE 2 2 1 6 PE 3 PE 4 3 4 Kernel graph, K Base CGRA graph, C PE 5 PE 6 5 6 Augmentation and Mapping
Problem Definition - Inexact Graph Matching Problem • How to find C which is most similar to C0 : Inexact graph matching • Similarity between two graph can be measured by calculating the cost of graph edit path • Edit path is the set of edit operations that transform G1 into another G2 • Edit operations • Node(or edge) substitution : NS, ES ( identical or non-identical ) • Node(or edge) insertion : NI, EI • Node(or edge) deletion : ND, ED • All the other edit operations are induced by Node substitution. Identical ES NS 1 e 2 a 3 h 4 d 5 b 6 g 7 f Non-identical ES & NI 1 2 a b c a2 b5 d e f f7 e1 d4 4 3 5 g h i g6 h3 6 7 ED EI <G2> <G1>
Graph Edit Cost Model • Ce - The cost of Edge deletion • Interconnection insertion cost • Cv - The cost of Node insertion • Routing PE insertion cost • Routing PE can replace interconnection insertion in case there are extra PEs • Do not need augmentation • can reduce the amount of architecture extension • Cv is much cheaper than Ce
A* Search for Min Cost Edit Path • Inexact graph matching problem is NP-complete How to search the mapping space for the min cost path : A* Search algorithm • Root : Kernel graph • Leaf : Sub-CGRA graph • s : current mapping state • g(s) : The sum of the costs(Ce, Cv) of the graph edit operations from root to current state s • h(s) : The estimated cost from current state s to a leaf state • Assessment of the partial mapping s • g(s) + h(s)
Vertex Scattering • Make clusters of vertex and assign each cluster to row • Strengths of Vertex scattering • Search space reduction • Considering shared resource constraints 1 1 1 2 2 Row 1 2 3 3 4 Row 2 5 4 3 PE 1 PE 2 PE 3 4 5 5 Clustering & Row assignment Kernel Final mapping PE 4 PE 5 PE 6
h(s) & Vertex Scattering • Heuristic function, h(s) … • guides the fast search of mapping space • needs cost estimation methods • Detecting difficult-to-map edges • After vertex scattering • Forks, Over-length edges cannot be mapped to a mesh without routing PE or a custom interconnection links • H(s) # of forks & over-length edges (=Nr ) • Unroutable difficult-to-map edge (c1) has more cost than routable (c2) 2 5 7 1 4 6 3
Example c1 = cv = 1 c2 = ce = 3 4 4 1 1 2 2 3 3 s=0 { } g( s ) + h( s ) = 0 + 1 4 1 s=1 {(11)} • s=3 • {(13)} s=2 {(12)} 0+1 0+1 0+1 3 3 2 s=4 {(42)} s=5 {(43), ($2)} 0+1 1+1 PE 1 PE 2 PE 3 s=7 {(25)} s=8 {(24)} s=6 {(26)} 0+1 0+1 0+1 PE 4 PE 5 PE 6 s=9 {(33), ($5} s=10 {(35), ($4)} 4+0 1+0
Experimental Setup • We test I2CRF on a CGRA called RSPA • mesh base interconnection • Each row has 2 shared multipliersEach row can perform 2 loads and 1 store • PE can be used for routing • Benchmarks from • Livermore loops, MultiMedia and DSPStone • Comparison to Mesh, 1-hop, Diagonal, and Mixed
Performance Improvement • IPC of 16 is equivalent to 100% utilization • PE utilization and the IPC are increased by more than 70% on average compared to Mesh or by 41% on average compared to Mixed
Customization Overhead • Through our interconnection increment, … • # of new interconnection links is very small • Very marginal increase in the overall Mux complexity
Optimization Time • Find competitive custom interconnection architecture with configuration in reasonable time.
Conclusion • We presented an interconnection customization method for CGRAs • Our method exploits the similarity between the interconnection customization problem and inexact graph • Non-homogeneous extensions to a base interconnection architecture may present some challenges and possibly penalty in back-end VLSI design matching • We plan to find out the extent of the difficulty due to the non-homogeneity as well as find novel ways to mitigate any impact if necessary