Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM

Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM

Memory: Some embedded systems require large amounts; others have small memory requirements Often must use a hierarchy of memory devices Memory allocation may be static or dynamic Main concerns [in embedded systems]: make sure allocation is safe minimize overhead

Memory types: RAM DRAM—asynchronous; needs refreshing SRAM—asynchronous; no refreshing Semistatic RAM SDRAM—synchronous DRAM ROM—read only memory PROM—one time EPROM—reprogram (uv light) EEPROM—electrical reprogramming FLASH—reprogram without removing from circuit

fig_04_01 Standard memory configuration: Memory a “virtual array” Address decoder Signals: address, data, control fig_04_01

ROM—usually read-only (some are programmable-”firmware”) transistor (0 or 1) fig_04_02 fig_04_02

fig_04_04 SRAM—similar to ROM In this example—6 transistors per cell (compare to flipflop?) fig_04_04

SRAM read: precharge bi and not bi to a value halfway between 0 and 1; word line drives bi’s to stored value SRAM write: R/W line is driven low fig_04_05 fig_04_05

fig_04_06 Dynamic RAM: only 1 transistor per cell READ causes transistor to discharge; it must be restored each time refresh cycle time determined by part specification fig_04_06

fig_04_08 DRAM read and write timing: fig_04_08

fig_04_08 Comparison—SRAM / DRAM fig_04_08

fig_04_09 Memory: typical organization (SRAM or DRAM): fig_04_09

fig_04_11 Two important time intervals: access time and cycle time fig_04_11

Terminology: Block: logical unit of transfer Block size Page—logical unit; a collection of blocks Bandwidth—word transition rate on the I/O bus (memory can be organized in bits, bytes, words) Latency—time to access first word in a sequence Block access time—time to access entire block fig_04_12 fig_04_12

Memory interface: Restrictions which must be dealt with: Size of RAM or ROM width of address and data I/O lines

fig_04_14 Memory example: 4K x 16 SRAM Uses 2 8-bit SRAMs (to achieve desired word size) Uses 4 1K blocks (to achieve desired number of words) Address: 10 bits within a block 2 bits to specify block— CS (chip select) fig_04_14

fig_04_16 Write: 8-bit bus, two cycles per word fig_04_16

fig_04_17 Read: choose upper or lower bits to put on the bus fig_04_17

If insufficient I/O lines: must multiplex signals and store in registers until data is accumulated (common in embedded system applications) Requires MAR / MDR configuration typically

DRAM: Variations available: EDO, SDRAM, FPM—basically DRAMs trying to accommodate ever faster processors Techniques: --synchronize DRAM to system clock --improve block accessing --allow pipelining As with SRAM, typically there are insufficient I/O pins and multiplexing must be used

Terminology: RAS—row address strobe CAS—column address strobe note either leading or trailing edge can capture address RAS cycle time RAS to CAS delay Refresh period fig_04_19 fig_04_19

fig_04_20 Example: EDO—extended data output: one row, multiple columns fig_04_20

Refreshing the DRAM: overhead; must refresh all rows, not just those accessed; Will this be controlled internally or externally? Example: refresh one row at a time, external refresh 4M memory: 4K rows, 1K columns: 22 address bits 12 I/O pins, 10 shared between row and column Refresh each row every 64 ms 2-phase clock (for greater flexibility), 50 MHz source fig_04_21 fig_04_21

fig_04_22 Refresh timing: must refresh one row every 16 musec Use a 9-bit counter incremented from either phase of the 25 MHz clock, refresh at count 384 (15.36musec)—this provides some timing margin fig_04_22

fig_04_23 Refresh address—12 bit binary counter—increment following the completion of each row refresh fig_04_23

fig_04_24 Address selection: Read / write refresh fig_04_24

fig_04_25 • Refresh arbitration: avoid R/W conflicts • If normal R/W operation starts, allow it to complete • If refresh operation has started, remember R/W operation • For a tie, normal R/W operation has priority • Required signals: • R/W—has been initiated • Refresh interval—has elapsed • Normal request—by arbitration logic • Refresh request—by arbitration logic • Refresh grant—by arbitration logic • Normal active—R/W has started • Refresh active—refresh has started fig_04_25

Row and column addresses (column only 10 bits): Generate row, column addresses on phase 1, RAS, CAS on phase 2: fig_04_25 fig_04_25

fig_04_27 Arbitration circuit: request portion followed by grant portion fig_04_27

fig_04_28 Complete system: fig_04_28

fig_04_29 R/W cycles: fig_04_29

fig_04_30 Memory organization: Typical “Memory map” For power loss fig_04_30

fig_04_31 Memory hierarchy fig_04_31

fig_04_32 Paging / Caching Why it typically works: locality of reference (spatial/temporal) “working set” Note: in real-time embedded systems, behavior may be atypical; but caching may still be a useful technique fig_04_32

fig_04_33 Typical memory system with cache: hit rate (miss rate) important fig_04_33

Basic caching strategies: Direct-mapped Associative Block-set associative questions: what is “associative memory”? what is overhead? what is efficiency (hit rate)? is bigger cache better? fig_04_33

Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM

Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM

Presentation Transcript

Chapter 5 Memory

Memory Organization

Memory Organization

Memory

CS1104 – Computer Organization

Advanced Computer Architecture Memory Hierarchy Design

Memory

The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2)

Memory

Memory

Advanced Computer Architecture Memory Hierarchy Design

ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 22, 2013 GPUMemories

Row Buffer Locality Aware Caching Policies for Hybrid Memories

Chapter 10 Memory Interface

The Memory Hierarchy

COEN 180

Semiconductor Memories

Memory

Memory

Exploiting the Memory Hierarchy

Memory Hierarchy

Memory and Storage