1 / 68

Write Cache and Write Main Memory Performance Improvement

This article discusses the performance improvement of write operations through the use of a write cache and write main memory, considering a write buffer processor and cache, write buffer cache memory controller, and write buffer cache logic.

mdeleon
Download Presentation

Write Cache and Write Main Memory Performance Improvement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index 16 14 2 Valid Tag Data 16K entries 16 32 Data = Hit

  2. Write – Through Performance Improvement Every Write : Write Cache and Write Main Memory Can be 10% to 15% of instructions

  3. Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory

  4. Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Address Data Valid Main Memory

  5. Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

  6. Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

  7. Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache • Write Buffer Full • (Write Miss – HOLD) Main Memory

  8. Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache • Write Buffer Full • (Write Miss – HOLD) • 2. Read Miss Main Memory

  9. Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache • Write Buffer Full • (Write Miss – HOLD) • Read Miss • Wait until Write Buffer • is empty. Main Memory

  10. Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block

  11. Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality)

  12. Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word 2 Word 1 Word 0

  13. Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word 2 Word 1 Word 0 Shared Valid and Tag more efficient use of memory

  14. 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset

  15. 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries (Blocks)

  16. 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Hit

  17. 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

  18. Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache = Index

  19. Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache = Index Address = 131408 ( byte ) Block address =

  20. Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address)

  21. Consider this 4K ( 4096 ) entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks)

  22. Consider this 4K ( 4096 ) entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks) 8213 -4096 4117 -4096 21

  23. 31 . . . 16 15 . . . 4 3 2 1 0 131408 ( byte ) 8213 ( block) Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 21 16 = Mux Hit Data 32

  24. 31 . . . 16 15 . . . 4 3 2 1 0 READ Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

  25. READ MISS Load Cache with 4 Words, Tag and Valid 31 . . . 16 15 . . . 4 3 2 1 0 Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

  26. WRITE WORD 31 . . . 16 15 . . . 4 3 2 1 0 Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

  27. Write Word for Multiword Cache Block ( Write-Through) • Procedure: • Write the Data Word to cache and compare Tags • If Hit, done. Go to 4

  28. Write Word for Multiword Cache Block ( Write-Through) • Procedure: • Write the Data Word to cache and compare Tags • If Hit, done. Go to 4 • If not Hit, ( Write Miss) • Load block from Main Memory to Cache • Write Data Word to cache

  29. Write Word for Multiword Cache Block ( Write-Through) • Procedure: • Write the Data Word to cache and compare Tags • If Hit, done. Go to 4. • If not Hit, ( Write Miss) • Load block from Main Memory to Cache • Write Data Word to cache 4. Write the Data Word to Main Memory

  30. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Miss Penalty Miss Rate Block Size Block Size Constant Size Cache

  31. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

  32. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

  33. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

  34. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Average Access Time Block Size

  35. DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word.

  36. DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word. Remember Miss Penalty goes UP !

  37. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

  38. Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block.

  39. Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart”

  40. Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available.

  41. Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First”

  42. Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First” Disadvantage: Complex Control Likely access cache block before transfer is complete

  43. Reducing the Miss Penalty • Reduce the time to read the multiple words from Main • Memory to the cache block. • Assume Memory Access times: • 1 clock cycle to send address • 10 Clock cycles to access DRAM • 1 clock cycle to send a word of data

  44. Reducing the Miss Penalty • Reduce the time to read the multiple words from Main • Memory to the cache block. • Assume Memory Access times: • 1 clock cycle to send address • 10 Clock cycles to access DRAM • 1 clock cycle to send a word of data • For sequential transfer of 4 data words: • Miss Penalty = 1 + 4 *( 10 +1) = 45 clock cycles

  45. What if we could read a block of words simultaneously from the Main Memory? Cache Entry Tag Word3 Word2 Word1 Word0 Valid 32 32 32 32 Main Memory

  46. What if we could read a block of words simultaneously from the Main Memory? Cache Entry Tag Word3 Word2 Word1 Word0 Valid 32 32 32 32 Main Memory Miss Penalty = 1 + 10 + 1 = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

  47. What about 4 banks of Memory? “Interleaved Memory” Cache Banks are accessed in parallel Words are transferred serially Address Bank 3 Bank 2 Bank 1 Bank 0

  48. What about 4 banks of Memory? “Interleaved Memory” Cache Banks are accessed in parallel Words are transferred serially Address Bank 3 Bank 2 Bank 1 Bank 0 Miss Penalty = 1 + 10 + 4 * 1 = 15 clock cycles Miss Penalty for Parallel = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

  49. Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Increase Cache size Increase Block size Main Memory Organization Average Access Time Block Size

More Related