350 likes | 501 Views
EFetch : Optimizing Instruction Fetch for Event-Driven Web Applications. Gaurav Chadha , Scott Mahlke , Satish Narayanasamy University of Michigan August, 2014. University of Michigan Electrical Engineering and Computer Science. Evolution of the Web. Web 1.0. Web 2.0. server.
E N D
EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, SatishNarayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science
Evolution of the Web Web 1.0 Web 2.0 server published content user generated content published content user generated content client • Static Web Pages • Passively view content • Dynamic Web Pages • Collaborate and generate content
Evolution of Web Web 1.0 Web 2.0 server compute compute published content user generated content published content user generated content compute client • Rich user experience
Evolution of the Web Web 1.0 Web 2.0 30x more instructions executed Good client-side performance Rich User Experience Browser responsiveness yahoo.com in 2014 yahoo.com in 1996
Core Specialization Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches Multi-core processor Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches
Web Core Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches Multi-core processor Core 2 Core 3 Core 4 Core 1 Web Core WebBoost Private Caches Private Caches Private Caches Private Caches
WebBoost 1.0 Web browser computational components Other Web 1.0 Web 2.0 Web client-side script performance Browser responsiveness Script performance: High L1-I cache misses Goal: Specialized instruction prefetcher for web client-side script
Poor I-Cache Performance • Web pages tend to support numerous functionalities • Large instruction footprint • Lack hot code graphics effects image editing online forms document editing web personalization games audio & video • Web client-side script inefficiencies : code bloat • JIT compiled by JS engine • Dynamic typing V8 IonMonkey Nitro Chakra
Lack of Hot Code 860 20,400 95%
Poor I-Cache Performance • Compared to conventional programs, JS code incurs many more L1-I misses • Perfect I-Cache: 53% speedup
Problem Statement • Problem: Poor web client-side script I-Cache performance • Opportunity: Web client-side scripts are executed in an event-driven model • Solution: • Specialized prefetcherthat is customized for event-driven execution model • Identifies distinct events in the instruction stream
Web Browser Events Mouse Click External Input Event On Load Internal Browser Event
Event-driven Web Applications Popping an event for execution Executes on JS Engine Event Queue • Poor I-Cache performance • Different events tend to execute different code • Events typically execute for a very short duration E2 E3 E1 Events inserted in to the queue Head Events generate other events Internal Events Event Queue empty - Program waits External Input Events • Timer event • DOMContentLoaded Mouse Click Keyboard key press GPS events Renderer Thread
EFetch • Event Fetch - Instruction Prefetcher for event-driven web applications • Technique: • Uses an event ID to identify distinct events in the instruction stream • Event ID is augmented to create an event signature that predicts control flow well E1 E2 E3 Event ID Renderer Thread
Event Signature Event Handler Event Type E1 • Formed by the browser • Uniquely identifies an event Event ID E2 Event Signature Correlates well the program control flow E3 Formed in the hardware from context depth (3)ancestor functions in the Call Stack Function Call Context Renderer Thread
Instruction Prefetcher: Facets Instruction Prefetcher Whatto prefetch? Whento prefetch?
What to Prefetch? • Naïve solution: On a function call, prefetch the function body • But, this is too late • Our approach: On a function call, predict its callees and prefetch their function body addresses c1: <I-Cache Addr> c2: <I-Cache Addr> c3: <I-Cache Addr> event ID Event Signature ci-callee
Duplication of Addresses • A function can appear in two distinct event signatures • Its body addresses might be duplicated I-Cache addresses callee event h < A, B, C > g f h < A, C, D > h f g event event
Compacting I-Cache Addresses h < A, B, C > < A, B, C, D > < A, C, D > h callee bit vector g g f f ( 1, 1, 1, 0 ) ( 1, 0, 1, 1 ) h h < A, B, C, D > f g event event
Recording Callees and Function Bodies c1 bit vector < A, B, C, D > c2 bit vector c2 bit vector FunctionTable callee Context Table event signature
Instruction Prefetcher: Facets Instruction Prefetcher Whatto prefetch? Whento prefetch?
When to Prefetch? • When?: Important to prefetch sufficiently in advance, but not too early • Goal: Prefetch the next predicted function • Able to hide LLC hit latency • Typically sufficient due to low instruction miss rate in LLC • Our Design: Keep track of a speculative call stack – Predictor Stack
Predictor Stack • Maintains the call stack as predicted by the prefetcher • Helps prefetch the next function predicted to be called Function Prefetched f call return return call call return h i h h i h i i f f Call Stack Predictor Stack
Architecture Function Table Context Table Predicted callees, addresses b1 b2 EA Function Call Context Event Signature X d ci bv bv Event-ID Prefetch Queue Call Stack Predictor Stack
Methodology • Instrumented open source browser – Chromium • It uses the V8 JS engine shared with Google Chrome • Browsing sessions of popular websites were studied • Their instruction traces were simulated with Sniper Sim • Our focus was on JS code execution, which was simulated
Architectural Details • Modeled after Samsung Exynos5250 • Core: 4-wide OoO, 1.66 GHz • L1-(I,D) Cache: 32 KB, 2-way • L2 Cache: 2 MB, 16-way • Energy Modeling: Vdd= 1.2 V, 45 nm
Related Work • We compare EFetch with the following designs: • L1I-64KB: Hardware overhead of EFetch provisioned towards extra L1-I cache capacity – 64 KB • N2L: Next-2 line prefetcher • CGP: Call Graph Prefetching • PIF: Proactive Instruction Fetch • RDIP: Return address stack Directed Instruction Prefetching Annavaram, et. al. HPCA ‘01 Ferdman, et. al. MICRO ‘11 Kolli, et. al. MICRO ‘13
Energy Consumption • Prefetching hardware structures consume little energy • Ranging from 0.01% of the total energy consumed for EFetch to 1.06% for PIF • Erroneous prefetches consume significant fraction of energy
Energy, Performance, Area CGP N2L PIF Energy RDIP EFetch Performance
Conclusion • Web 2.0 places greater demands on client-side computing • I-Cache performance is poor for web client-side script execution • EFetch exploits the event-driven nature of web client-side script execution • It achieves 29% performance improvement over no prefetching
EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, SatishNarayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science
Performance Potential Perfect I-Cache: 53% speedup