Introduction to Embedded Systems

Introduction to Embedded Systems Rabie A. Ramadan rabieramadan@gmail.com http://www.rabieramadan.org/classes/2014/embedded/ 3

Memory Component Models Cache Memory Mapping \ Memory

Memory Component Models

Larger memory structures can be built from memory blocks. Memory Mapping is required Multiport memories

The size of the register file is fixedwhen the CPU is predesigned. Register file size is a key parameter in CPU design that affects code performance and energy consumption as well as the area of the CPU. If the register file is too small, The program must spill values to main memory: The value is written to main memory and later read back from main memory. Spills cost both time and energy because main memory accesses are slower and more energy-intense than register file accesses Register Files

If the register file is too large, then it consumes static energy as well as taking extra chip area that could be used for other purposes. Register Files

When designing an embedded system, we need to pay extra attention to the relationship between the cache configuration and the programs that use it. Too-smallcaches result in excessive main memory accesses; Too-largecaches consume excess static power. Longer cache lines provide more prefetching bandwidth, which is useful in some algorithms but not others. Caches

Line size affects prefetching behavior— Programs that access successive memory locations can benefit from the prefetching induced by long cache lines. Long lines can also, in some cases, provide reuse for very small sets of locations. Cache Memory Mapping is another issue Caches

Wolfe and Lam Classification to Behavior of Arrays

Several groups, have proposed configurablecacheswhose configuration can be changed at runtime. Additional multiplexers and other logic allow a pool of memory cells to be used in several different cache configurations. Caches

Cache is designed to move a relatively small amount of memory close to the processor. Caches use hardwired algorithms to manage the cache contents Hardware determines when values are added or removed from the cache. Scratch pad memory is located parallel to the cache. the scratch pad does not include hardware to manage its contents. Scratch Pad Memories

Part of the memory address space controlled by the processor Scratch pad is managed by software, not hardware. Provides predictable access time. Requires values to be allocated. Use standard read/write instructions to access scratch pad. Scratchpadmemory

A memory map for a processor defines how addresses get mapped to hardware. The total size of the address space is constrained by the address width of the processor. A32-bit processor, for example, can address 232 locations, or 4 gigabytes (GB), assuming each address refers to one byte. Memory Maps

Separatesaddresses used for program memory (labeled A) from those used for data memory (B and D). Memories accessed via separate buses, Permitting instructions and data to be fetched simultaneously. Effectively doubles the memory bandwidth. Such a separation of program memory from data memory is known as a Harvard architecture. An ARM CortexTM - M3 architecture,

Includes a number of on-chip peripherals (C) Devices that are accessed by the processor using some of the memory addresses Timers, ADCs, UARTs, and other I/O devices Each of these devices occupies a few of the memory addresses by providing memory-mapped registers An ARM CortexTM - M3 architecture

Memory Hierarchy • The idea • Hide the slower memory behind the fast memory • Cost and performance play major roles in selecting the memory.

Hit Vs. Miss • Hit • The requested data resides in a given level of memory. • Miss • The requested data is not found in the given level of memory • Hitrate • The percentage of memory accesses found in a given level of memory. • Missrate • The percentage of memory accesses not found in a given level of memory.

Hit Vs. Miss (Cont.) • Hit time • The time required to access the requested information in a given level of memory. • Miss penalty • The time required to process a miss, • Replacing a block in an upper level of memory, • The additional time to deliver the requested data to the processor.

Miss Scenario • The processor sends a request to the cache for location X • if found  cache hit • If not  try next level • When the location is found  load the whole block into the cache • Hoping that the processor will access one of the neighbor locations next. • One miss may lead to multiple hits Locality • Can we compute the average access time based on this memory Hierarchy?

Average Access Time Assume a memory hierarchy with three levels (L1, L2, and L3) What is the memory average access time?

Cache Mapping Schemes

Cache Mapping Schemes • Cache memory is smaller than the main memory • Only few blocks can be loaded at the cache • The cache does not use the same memory addresses • Which block in the cache is equivalent to which block in the memory? • The processor uses Memory Management Unit (MMU) to convert the requested memory address to a cache address

Direct Mapping Cache Memory Assigns cache mappings using a modular approach j = i mod n j cache block number i memory block number n number of cache blocks

Example Given M memory blocks to be mapped to 10 cache blocks, show the direct mapping scheme? How do you know which block is currently in the cache?

Direct Mapping (Cont.) Bits in the main memory address are divided into three fields. Word  identifies specific word in the block Block  identifies a unique block in the cache Tag  identifies which block from the main memory currently in the cache

Example Tag Consider, for example, the case of a main memory consisting of 4K blocks, a cache memory consisting of 128 blocks, and a block size of 16 words. Show the direct mapping and the main memory address format?

Example (Cont.)

Direct Mapping • Advantage • Easy • Does not require any search technique to find a block in cache • Replacement is a straight forward • Disadvantages • Many blocks in MM are mapped to the same cache block • We may have others empty in the cache • Poor cache utilization

Group Activity 1 Consider, the case of a main memory consisting of 4K blocks, a cache memory consisting of 8 blocks, and a block size of 4 words. Show the direct mapping and the main memory address format?

Group Activity 2 Given the following direct mapping chart, what is the cache and memory location required by the following addresses:

Fully Associative Mapping Allowing any memory block to be placed anywhere in the cache A search technique is required to find the block number in the tag field

Example We have a main memory with 214 words , a cache with 16 blocks , and blocks is 8 words. How many tag & word fields bits? Word field requires 3 bits Tagfield requires 11 bits 214 /8 = 2048 blocks

Fully Associative Mapping • Advantages • Flexibility • Utilizing the cache • Disadvantage • Required tag search • Associative search  Parallel search • Might require extra hardware unit to do the search • Requires a replacement strategy if the cache is full • Expensive

N-way Set Associative Mapping • Combines direct and fully associative mapping • The cache is divided into a set of blocks • All sets are the same size • Main memory blocks are mapped to a specific set based on : s = i mod S • s specific to which block i mapped • S total number of sets • Any coming block is assigned to any cache block inside the set

N-way Set Associative Mapping Tag field uniquely identifies the targeted block within the determined set. Word field  identifies the element (word) within the block that is requested by the processor. Setfield identifies the set

Group Activity • Compute the three parameters (Word, Set, and Tag) for a memory system having the following specification: • Size of the main memory is 4K blocks, • Size of the cache is 128 blocks, • The block size is 16 words. • Assume that the system uses 4-way set-associative mapping.

Answer

N-way Set Associative Mapping • Advantages: • Moderate utilization to the cache • Disadvantage • Still needs a tag search inside the set

If the cache is full and there is a need for block replacement , Which one to replace?

Cache Replacement Policies • Random • Simple • Requires random generator • First In First Out (FIFO) • Replace the block that has been in the cache the longest • Requires keeping track of the block lifetime • Least Recently Used (LRU) • Replace the one that has been used the least • Requires keeping track of the block history

Cache Replacement Policies (Cont.) • Most Recently Used (MRU) • Replace the one that has been used the most • Requires keeping track of the block history • Optimal • Hypothetical • Must know the future

Example Consider the case of a 4X8 two-dimensional array of numbers, A. Assume that each number in the array occupies one word and that the array elements are stored column-major order in the main memory from location 1000 to location 1031. The cache consists of eight blocks each consisting of just two words. Assume also that whenever needed, LRU replacement policy is used. We would like to examine the changes in the cache if each of the direct mapping techniques is used as the following sequence of requests for the array elements are made by the processor:

Array elements in the main memory

Conclusion 16 cache miss No single hit 12 replacements Only 4 cache blocks are used

Group Activity Do the same in case of fully and 4-way set associative mappings ?

Stacks A stack is a region of memory that is dynamically allocated to the program in a last-in, first-out (LIFO) pattern. A stack pointer (typically a register) contains the memory address of the top of the stack. Stacks are typically used to implement procedure calls. Memory Models

In C, the compiler produces code that pushes onto the stack the location of: instruction to execute upon returning from the procedure, the current value of some or all of the machine registers, the arguments to the procedure, sets the program counter equal to the location of the procedure code. Stack Frame The data for a procedure that is pushed onto the stack . When a procedure returns: the compiler pops its stack frame, retrieving the program location at which to resume execution. Memory Models-Stacks

It can be disastrous if the stack pointer is incremented beyond the memory allocated for the stack - stack overflow Result in overwriting memory that is being used for other purposes. Becomes particularly difficult with recursive programs, where a procedure calls itself. Embedded software designers often avoid using recursionto circumvent this difficulty. Memory Models-Stacks

Introduction to Embedded Systems