1 / 23

Improving Instruction Locality with Just-In-Time Code Layout

A study on implementing dynamic code layout using activation order to improve instruction reference locality, eliminating the need for profile information required by current solutions.

emmaryan
Download Presentation

Improving Instruction Locality with Just-In-Time Code Layout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Instruction Locality with Just-In-Time Code Layout J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University

  2. Improve instruction reference locality big problem for commodity applications Eliminate need for profile information required by current compiler-based solutions Goals

  3. Implement layout dynamically using Activation Order: A new heuristic for code layout. Locate procedures in order of use. How?

  4. No special hardware support. Minimal changes to the operating system. Minimal system overhead. Requirements

  5. Optimizing Procedure Layout Bad Layout Better Layout

  6. Nodes are procedures. Edges are caller/callee pairs. Weights are call frequency. WinMain() 1 1 Initialize() EventLoop() 129394 68754 GetEvent() React() 128404 1 CheckForInputError() 68753 HandleRareCase() 10 HandleCommonCase() HandleInputError() Current Practice: Pettis and Hansen

  7. EventLoop() 129394 68754 EventLoop() Node-2 GetEvent() 129394 68754 68754 React() Node-1 128404 React() React() CheckForInputError() 68753 68753 68753 HandleCommonCase() HandleCommonCase() HandleCommonCase() Node-3 Node-4 68753 HandleCommonCase() Pettis and Hansen Layout layout: [] layout: [GetEvent, CheckForInputErrors] layout: [EventLoop, GetEvent, CheckForInputErrors] layout: [React, EventLoop, GetEvent, CheckForInputErrors] layout: [HandleCommonCase, React, EventLoop, GetEvent, CheckForInputErrors]

  8. A New Heuristic Activation Order: Co-locate procedures that are activated sequentially. Example:

  9. Implementing JITCL __start: perform initializations call thunk_main thunk_main: . . . thunk_foo: . . . __InstructionMemory: Thunk routines implement code layout on-the-fly.

  10. Thunk routines // Global variables: // ProcPointers[] - one element per procedure // INDEX_proc and LENGTH_proc for each procedure thunk_main: if (InCodeSegment(ProcPointers[INDEX_main])) ProcPointers[INDEX_main] = CopyToTextSegment(ProcPointer[INDEX_main], LENGTH_main); PatchCallSite(ProcPointer[INDEX_main], ComputeCallSiteFromReturnAddress(RA)); jmp ProcPointer[INDEX_main]; The thunk routines copy procedures into the text segment and update call sites at run-time.

  11. UNIX/RISC Win32/x86 Cache Size 8K 8K Associativity Direct-Mapped 2-Way Simulation ATOM Etch Simulation Methodology

  12. Workloads

  13. The AO heuristic is effective. The overhead of JITCL is negligible. JITCL improves procedure layout without requiring profile information. JITCL reduces program memory requirements. Results

  14. Results: The AO Heuristic Improvement in I-Cache Miss Rate Conclusion: Effectiveness of heuristic is comparable to P&H.

  15. Copy overhead instruction overhead cache overhead Cache consistency Disk overhead - comparable to demand loaded text; not evaluated. Overhead of JITCL

  16. Results: Overhead Overhead Instructions (%) Conclusion: JITCL Overhead is less than 0.1% in all cases.

  17. Results: Performance Saved Cycles per Instruction Conclusion: Overall performance is comparable to P&H.

  18. Windows applications are composed of multiple executable modules. When transitions between modules are frequent, intra-module code layout is less effective. With JITCL, inter-module code layout is possible and beneficial. JITCL for Win32 Applications

  19. Win32 Cache Miss Rates Conclusion: Careful layout did not help Win32 applications.

  20. Text Segment Size Text size in megabytes Conclusion: JITCL typically reduces text size by 50%.

  21. JITCL provides an alternative to feedback-based procedure layout. Many important optimizations still require profile information. instruction scheduling register allocation other intra-procedural optimizations Don’t expect profile-based optimization to go away! JITCL vs. PBO

  22. Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need for profiles. The AO heuristic is effective. The overhead of procedure copying is low. Benefit in I-Cache is comparable to Pettis and Hansen layout. JITCL can reduce working set size. Conclusions

  23. M o r p h The Morph Project For more information: http://www.eecs.harvard.edu/morph/

More Related