FPGA Hardware Implementation for Page Generation with Area and IO Constraints

CS137:Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with Eylon Caspi]

Today • Cover/clustering • Minimize Weight • W/ area and IO constraints • Motivation: SCORE Page generation • Also energy minimization • Techniques • Current Results • FPGA/hardware implementation?

Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

memory segment memory segment Compile TDF operator compute page stream stream SCORE Compilation Programming ModelExecution Model • Graph of TDF FSMD operators • Graph of page configs - unlimited size, # IOs - fixed size, # IOs - no timing constraints - timed, single-cycle firing

How Big is an Operator? • JPEG Encode • JPEG Decode • MPEG (I) • MPEG (P) • Wavelet Encode • IIR • Wavelet Decode • Wavelet Encode • JPEG Encode • MPEG Encode

Clustering is Critical • Inter-page comm. latency may be long • Inter-page feedback loops are slow • Cluster to: • Fit feedback loops within page • Fit feedback loops on device

DF CF i two_i *2 state pipeline pipeline Pipeline Extraction • Hoist uncontrolled FF data-flow out of FSMD • Benefits: • Shrink FSM cyclic core • Extracted pipeline has more freedom for scheduling and partitioning i Extract state foo(i): acc=acc+2*i state foo(two_i): acc=acc+two_i

Pipeline Extraction – Extractable Area • JPEG Encode • JPEG Decode • MPEG (I) • MPEG (P) • Wavelet Encode • IIR

Page Generation • Pipeline extraction • removes dataflow can freely extract from FSMD control • Still have to partition potentially large FSMs • approach: turn into a clustering problem

IA IB OA OB State Clustering • Start: consider each state to be a unit • Cluster states into page-size sub-FSMDs • Inter-page transitions become streams • Possible clustering goals: • Minimize delay (inter-page latency) • Minimize IO (inter-page BW) • Minimize area (fragmentation)

State Clustering to Minimize Inter-Page State Transfer • Inter-page state transfer is slow • Cluster to: • Contain feedback loops • Minimize frequency ofinter-page state transfer • Previously used in: • VLIW trace scheduling [Fisher ‘81] • FSM decomposition for low power[Benini/DeMicheli ISCAS ‘98] • VM/cache code placement • GarpCC code selection [Callahan ‘00]

Clustering Problem • SCORE Page • Fixed area (# of LUTs) • Fixed IO • Cost on edges is probability take state transition • Clustering Goal is to minimize page-to-page transition • Maximize expected transitions within same page • Find page-count/page-transition tradeoff curve

Pages Inter-Page Communication Frequency Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

DSM • Possibly relevant for minimizing delay in DSM • Previously discussed: • Larger area  longer wires, slower • Want to cluster logic locally • Maybe: • Cluster common computations together • Make distant computation transfer uncommon

Island Packing for Energy • Note: Modern FPGAs pack cluster of LUTs into an endpoint • e.g. Altera LAB

Island Packing for Energy • Modern FPGAs pack cluster of LUTs into an endpoint • e.g. Altera LAB • Local wiring less energy cost than long wiring • Covering for energy: • minimize exposed activity factor • same covering problem

Clusters/Islands Switching Activity Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

First Try • Use FBB (flow cut) [Wong/cs137a:day7] • Pick seed element • Compute mincut • On mix of IO, cost edge weights? • If too small, • Cluster in node and repeat • Else • Cluster out node and repeat

Mincut lessons • Couldn’t consistently control IO • Non-monotonic results adjusting weight • Not clear what to cluster in

Idea #2 • If we had an ordering of nodes • (wishful thinking) • Then easy to know how to include more • Just pick the next node • Order: 1D list of nodes • Cluster: a contiguous sequence of nodes in list • Specify start, finish

From Sequence to Clusters • Easy to know if a contiguous subsequence • Meets area constraints • Meets io constraints • Cover • Set of (non-overlapping) subsequences • Include all nodes

Feasible Clusters (mult16a)

Covering • Not clear when to put more or less stuff in a cluster…versus leave with next cluster • Can’t build clusters greedily • Like associative/parthesization problem saw earlier [day 5]

Similar But compute from all breaks across a diagonal Not just nearest neighbor Hence extra O(N) Day 5 Parenthesis Matching

Dynamic Programming • For each subsequence start,end • Either the area and io match • OR want to find a breakpoint between cluster sets • Cluster sets startmidpoint, midpointend may each either be single or multiple clusters • Different splits may • Minimize number of clusters • Minimize cost • Keep dominator set [day11]

Algorithm • Compute Linear Order • Compute IO, Area on each subsequence • Think NxN table (but sparse) • Use Dynamic Programming to cover

Compute Order? • Could experiment with various techniques • Considering: Spectral Ordering • [Hall/cs137a:day7] • How weight edges? • IO, cost, mix? • Try linear mix…vary mix weighting

Weight Mix • Why unclear? • IO weight  good to cluster connectivity • If Ios limited, allows to use fewer clusters • Pack more stuff into pageless cases need to transition • Cost weight  what we’re minimizing • Cluster high cost edges together • Hide in page • But, cost ordering may get less stuff in page if poorly IO clustered…

spp results • [see HTML]

Versus Weighting (w by 0.01)

Discussion • Promising Results • New capability not clear what compare to • Maybe LUT clustering to validate algorithm • Absolutes look promising • Weighting • Not clear how to search for best • Maybe should try other ways of weighting? • [Michael suggests try taking log(trans)]

Spatial/Hdw Implementation? • Compute Linear Order • Use 1D FDSA? • Compute IO, Area on each subsequence • Parallel prefix sum scan • One for each start point? • Use Dynamic Programming to cover • Like parenthesis • Maybe 1D and combine with area/io scan?

Promising Ideas • Compute good ordering • Easy to vary inclusion when know what’s next to include/exclude • Mix weights • Cluster to minimize exposed (cut) costs

FPGA Hardware Implementation for Page Generation with Area and IO Constraints

FPGA Hardware Implementation for Page Generation with Area and IO Constraints

Presentation Transcript

Boolean Satisfiability in Electronic Design Automation (EDA )

ECE 681 VLSI Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ELEC516 VLSI System Design and Design Automation Spring 2010 Course Description

Electronic Design Automation

Magnet Spacers Design Automation

Council on Electronic Design Automation Chapter

CS137: Electronic Design Automation

CS137: Electronic Design Automation

Lecture 10 Design Automation

The Emerging Electronic Design Automation

NSF Workshop Electronic Design Automation Past, Present, and Future July 8-9, 2009

CS137: Electronic Design Automation

CS137: Electronic Design Automation

ECE 681 VLSI Design Automation

Radioactive Source Tests at OSU

ERT 457 – DESIGN OF AUTOMATION SYSTEMS

Industrial Automation Services Key Trends, Vendor Strategies market Analysis and Forecast 2015

Electronics/Industrial Design Automation Training Chennai,Certification Courses on IT/ITES

Electronics Design Automation Training Chennai,Certification Courses on IT/ITES