140 likes | 249 Views
Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption. † H.S. Kim, † V. Narayanan, † M. Kandemir, ‡ E. Brockmeyer, ‡ F. Catthoor, † M.J. Irwin Aug. , 2003. † Dept. Computer Science and Engineering The Pennsylvania State University ‡ IMEC, Belgium.
E N D
Estimating Influence of Data Layout Optimizations on SDRAM EnergyConsumption †H.S. Kim, †V. Narayanan, †M. Kandemir, ‡E. Brockmeyer, ‡F. Catthoor, †M.J. Irwin Aug. ,2003 †Dept. Computer Science and Engineering The Pennsylvania State University ‡IMEC, Belgium
Estimating influence of data layout optimizations on SDRAM energy • Applications demand much larger memory bandwidth (eg. Video applications) • There have been much work on reducing off-chip memory access frequency by improving local (intermediate) memory locality • Locality in SDRAM itself make significant difference on energy, as well (a page open operation is 6 times more expensive than a data read operation) • Estimation of the number of page open operation (page break) can serve as an energy estimate of various optimizations • Data Layout optimization • Conventional Layout vs. Blocked Layout
ROW DECO- DER ROW DECO- DER ROW DECO- DER ROW DECO- DER CONTROL COMMANDS BANK 0 MEMORY ARRAY BANK 0 MEMORY ARRAY BANK 0 MEMORY ARRAY BANK 0 MEMORY ARRAY SENSE AMS ADDRESS DATA BUFFER COLUMN DECODER CONTROL LOGIC MODE REGISTER Preliminaries (SDRAMS) • Banked architecture
command DQ D0 D1 D2 D3 tRP tRCD CAS latency Bank 0 /Page y tRRD command DQ D0 D1 D2 D3 D0 Precharge bank 0 Read data Activate bank 0 Lost cycles Bank /Page x Preliminaries (SDRAM operations) • One operation • Two consecutive operations to two different rows of one bank
SDRAM energy consumption D words,B burst size, Pmissmiss rate, eact = x*ed, estat_act = y*ed, where eact is energy per activation, ed energy per data transfer of one word, estat_act static energy per activation (Example) Microns 8MB SDRAM, eact = 13nJ, estat_act = 7nJ, ed = 3.6nJ, x+y ~ 6
Intra page break Inter page break Block Tile size = Page size Array Page break estimation of data layouts • Page break estimation can be used to estimate energy and performance of various optimization techniques • Estimation should take little time • In blocked layout, different tile/block sizes/shapes result in different number of page breaks
Estimation Modeling • Polyhedral Modeling of page breaks, implemented using Presburger Formulas • Valid Iteration Points • Lexicographical Ordering • Data Layouts in Memory • Mapping Memory Locations to Memory Banks • Page Break Estimation Model for Blocked Layout • Implementation • Omega Calculator to simplify the models (existential operators allowed, not possible in Polylib) • Polylib to count the numbers
Intra page breaks Intra/Inter page break models for blocked data layout • Inter page breaks
Experiments • E_ACT = (IDD0 - IDD3)*Trc*Vdd*Tcycle *#.pagebreaks • E_STAT = IDD3*Vdd* Tcycle *total_cycles • Benchmarks • qsdpcm (quadtree-structured motion estimation) • phods (parallel hierarchical motion estimation) • an edge_detect code from UTDSP benchmark suite • Various fetch tile/block shapes (set_1, set_2, set_3) • Architectural assumptions • a block of data is fetched from SDRAM into local data memory via Direct Memory Access (ie. software controlled intermediate memory) • SDRAM (MICRON’s 8MB/4 banked, 32b bus, 1KB pages)
C code ATOMIUM (memory instrumentation tool) Memory reference log (addr. size, time) SDRAM cycle simulator MICRON’s SDRAM Power Calculator #. page activations Total Activation energy Experiments • SDRAM power (& cycle) simulator to compare the estimates with
Results (qsdpcm, simulation) • Conventional layout shows varying energy numbers depending on the array size (800X640 vs. 176X144) • Blocked layout shows no variance on the array size
Results (row-major vs. blocked, phods) • Estimated numbers match the corresponding simulated numbers reasonably for both row-major and blocked layout
qsdpcm phods edge_detect Results (blocked layout, estimation vs. simulation) • Arrays w/ manifest indexes can be estimated without error (edge_detect) • Arrays w/ dynamic elements (eg. motion vectors) can be estimated reasonably (phods, qsdpcm) • Varying energy numbers depending on block/tile shapes (set_1 ~ set_3)
Conclusions and Future Work • Estimation framework tracks page breaks well • Blocked Layout reduces the number of page breaks significantly • Tile/Block shapes should be chosen carefully • On-going work • Refinement of estimation formulas for conventional/blocked layout of higher order dimensional arrays • Automation • Automatic incorporation of omega library and polylib • Automatic code transformation into main memory efficient data layout for each array • Exploration techniques to find optimal data layout