160 likes | 359 Views
Outline Tree Pattern Matching Graph Pattern Matching Layout-Driven Mapping FPGA Mapping Goal Understand pattern matching approaches Understand FPGA mapping Reading Hachtel and Somenzi, Ch. 13. Technology Mapping 2. Partition circuit DAGs into trees split at fan-out nodes
E N D
Outline Tree Pattern Matching Graph Pattern Matching Layout-Driven Mapping FPGA Mapping Goal Understand pattern matching approaches Understand FPGA mapping Reading Hachtel and Somenzi, Ch. 13 Technology Mapping 2
Partition circuit DAGs into trees split at fan-out nodes make only outputs roots of trees perform splitting incrementally when searching that tree stop at already-mapped nodes Algorithm find optimal mapping for each output tree use recursive graph isomorphism tree matching match all cells at root (output) find optimum mapping for each subtree (cell input) cost is cell plus cost of mappings of cell inputs top-down traversal to record cells exponential time in worst case Tree Pattern Matching a c b a a b c b look at top input to NAND
Tree Pattern Matching OptimalTree(tree) { mincost = INF; for all cells if (cell matches at tree.root) { cost = cell.cost; for all cell inputs cost += OptimalTree(cell.input[i]); if (cost < mincost) { mincost = cost; keep tree mapping; } } return(mincost); } for each output { outputcost = OptimalTree(output); scan top-down to get cell mapping; } cost = 5+2+2+5+2+2 = 18 cost = 4+4 = 8
Inject inverter pairs at gate outputs increases possible matches add fake inverter pair cell to library removes remaining inverter pairs from circuit Tree Pattern Matching without inverters with inverters
Tree matching is fast simple DFS of circuit and cell trees Still many tree representations for a function Might miss common subexpressions if cell matching does not line up with fan-out nodes stop by treating fan-outs as cell outputs Tree Matching Issues a b a a b c b c c b b c d d c d 2-input ANDs cost: 15 xistors 3-input ANDs cost: 16 xistors
Match subgraphs, not trees avoid breaking graph into forest of trees match more gate types XOR, MUX multiple-output gates Algorithm find all circuit subgraph to cell graph matchings generate DFS traversal of each cell graph from outputs O(C*N) for C circuit nodes, N total cell library nodes cover graph with minimum-cost cell graphs matrix with row for each circuit node, column for each cell, 1 if cell matches circuit, 0 otherwise find least-cost maximum independent set of rows branch-and-bound search algorithm bound is least-cost rows found so far exponential in worst case Graph Pattern Matching
boolean CellMatch(x, y) if (y.gatetype != x.gatetype) return(0); i = y.firstchild; j = x.firstchild; while (i != NULL && j != NULL) if (!CellMatch(i,j)) return(0); i = y.nextchild; j = x.nextchild; if (j == NULL) return(1); else return(0); for each node i in circuit graph for each cell j in library graphs if (CellMatch(i,j)) match[i][j] = 1 find least-cost maximum set of independent rows in match array Graph Matching Algorithm
Minimum-Cost Graph Cover 2 3 XOR2 INV NOR2 NAND2 Cost 14 2 4 4 a 0 0 1 0 b 1 0 0 1 c 0 0 0 1 d 0 1 0 0 e 0 0 0 1 f 0 0 0 1 4 1 5 6 XOR Cell DFS d c e a b f first solution: a, b, c, d, e bound = 4 + 4 + 4 + 2 + 4 = 22 least-cost solution: a, b, d cost = 4 + 14 + 2 = 20 NOR2 at a XOR2 at b INV at d Circuit DFS a - fails at 1 b - match at 1 c - fails at 2 d - fails at 1 e - fails at 2 f - fails at 4 Maximum number of rows - cover most nodes in fewest cells Independent - cells do not overlap
Goal minimize chip area previous approaches focus on cell area ignore inter-cell routing example - high fan-in and fan-out minimizes cell area takes a lot of routing Solution estimate placement and routing during mapping simple, fast estimates incrementally update during mapping only mapping a few gates at a time Layout-Driven Mapping
Programmable logic blocks multiplexor-based (Actel) lookup table (Xilinx) Problem lookup table of K inputs implements possible functions K = 5 typically impractical to use library cell matching approach requires 4 billion variations for each cell pattern/tree/graph similar problem for mux-based FPGAs Solutions clique partitioning bin packing OBDD matching FPGA Technology Mapping
Xilinx RAM configurable logic blocks (CLB) RAM programmable wiring 2 functions of 4 variables 1 function of 5 variables implemented via table lookup RAM Actel fuse configurable logic elements fuse programmable wiring all 2 and 3-variable functions some 4-variable functions FPGAs CLB
View each logic block as a bin consider each equation as sum-of-products pack product terms into as few bins as possible put OR term in CLB where it fits Algorithm NP-complete first fit decreasing sort product terms by decreasing size size = # of literals put each term in first CLB it fits in “fit” means CLB can include new term at most 22% worse than optimal O(N log N) time N-step lookahead - avoid local minima polynomial time Bin Packing