1 / 21

Using Uncacheable Memory to Improve Unity Linux Performance

Using Uncacheable Memory to Improve Unity Linux Performance. Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development Center Peking University. Hardware table walking in main memory. No snooping. Unity SoC architecture. Issues. Cache coherency problem everywhere !!.

deepak
Download Presentation

Using Uncacheable Memory to Improve Unity Linux Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Uncacheable Memory to Improve Unity Linux Performance Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development Center Peking University

  2. Hardware table walking in main memory No snooping Unity SoC architecture Issues Cache coherency problem everywhere !! Peking University

  3. poor temporal locality! Issues cont. User Process User Process process I/O buffer process I/O buffer Linux Kernel Linux Kernel kernel I/O buffer kernel I/O buffer DMA DMA I/O device buffer I/O device buffer I/O Device I/O Device Peking University

  4. Motivation • Heavy cost of Cache coherency operations • Many high-end embedded processors have Cache, But many of them have very limited support to guarantee cache coherency • Poor locality leads to more data Cache pollution • Cache is based on property of locality • Some programs have poor locality, for example TCP/IP processing How to avoid the disadvantages? Uncacheable memory may be a solution! Peking University

  5. Contributions • Analyze the scenarios in which Cache doesn’t perform well, propose uncacheable memory has two advantages • Eliminate most of Cache coherency operations • Avoid Cache pollution • Apply uncacheable memory in Unity Linux to improve the I/O performance. • Some important aspects improves from 5% - 29% Peking University

  6. Outline • Issues • Motivation • Contribution • Uncacheable Memory • Evaluation • Related Work • Conclusions Peking University

  7. using uncacheable memory Recv Packet Flow step 1 step 2 step 3 step 4 User Space Simple data processing flush cache User Buffer Kernel Space Buffer Buffer Buffer Buffer I/O Device CPU copy DMA copy Peking University

  8. using uncacheable memory Send Packet Flow step 1 step 2 step 3 step 4 User Space User Buffer clean cache DMA copy Kernel Space Buffer Buffer Buffer Buffer CPU copy Simple data processing I/O Device Peking University

  9. Cacheable vs. Uncacheable DMA send and receive cost analysis Peking University

  10. load U to Cache load K to Cache load U into Cache store to K load K to Cache load U into Cache and store load K load U into Cache and store Cacheable vs. Uncacheable cont. Cache clean cost DMA Send: Cache flush cost DMA Recv: Peking University

  11. Cacheable vs. Uncacheable cont. Recv and Send Performance CH vs NC Peking University

  12. Using Uncacheable Memory • Implemented in Unity Linux ported from Linux 2.4.17 • Uncacheable page table • eliminate Cache coherency operations when modifying the page tables • Uncacheable socket buffer for sending • eliminate Cache coherency operations • avoid data Cache pollution Peking University

  13. Outline • Motivation • Issues • Contribution • Uncacheable Memory? • Evaluation • Related Work • Conclusions Peking University

  14. Methodology • Benchmarks: Netperf, Lmbench and Modified Andrew benchmark. • Experiments environment • 160 MHz Unity network computer with 256 MB DRAM, a SoC build-in 10M/100M Ethernet card • Dell 4600 server, two Intel Xeon PIII 700 MHz processors with 4 GB DRAM and 1000M/100M Ethernet card • All benchmarks are executed in single-user mode on NFS. Peking University

  15. Netperf Benchmark Results Netperf TCP_STREAM Send Performance Peking University

  16. Netperf Benchmark Results cont. Netperf TCP_RR Performance Peking University

  17. Lmbench Benchmark Results Lmbench Performance Peking University

  18. Modified Andrew Benchmark Results Modified Andrew Benchmark Peking University

  19. Related Work • Related work: accelerate uncacheable memory performance • New memory type • Intel write-combining • MIPS R10000: uncached-accelerated page • New instructions • SPARC V9, ARM, Unity II: block move instructions • Future work: new memory type support • Read like common cache with low pollution • Write like Write-Combining without write-allocate Peking University

  20. Conclusions • This paper focuses on the uncacheable memory usage. • Pros: eliminating coherency operations and avoiding data Cache pollution. • Cons: slow accessing time • Uncacheable memory can perform well with a carefully design when considering system specialties Peking University

  21. Thank You! Questions? Peking University

More Related