Dynamic Feedback: An Effective Technique for Adaptive Computing

Dynamic Feedback:An Effective Techniquefor Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages Approach: Reduce Lock Overhead by Coarsening Lock Granularity Problem: Coarsening Lock Granularity May Reduce Available Concurrency

Solution: Dynamic Feedback • Multiple Lock Coarsening Policies • Dynamic Feedback • Generate Multiple Versions of Code • Measure Dynamic Overhead of Each Policy • Dynamically Select Best Version • Context • Parallelizing Compiler • Irregular Object-Based Programs • Pointer-Based Data Structures • Commutativity Analysis

Talk Outline • Lock Coarsening • Dynamic Feedback • Experimental Results • Related Work • Conclusions

Model of Computation Atomic Operations • Parallel Programs • Serial Phases • Parallel Phases Serial Phase Parallel Phase Serial Phase • Atomic Operations on Shared Objects • Mutual Exclusion Locks • Acquire Constructs • Release Constructs L.acquire() L.release() Mutual Exclusion Region

Problem: Lock Overhead L.acquire() L.release() L.acquire() L.release()

L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() Solution: Lock Coarsening Original After Lock Coarsening Reference: Diniz and Rinard “Synchronization Transformations for Parallel Computing”, POPL97

Lock Coarsening Trade-Off • Advantage: • Reduces Number of Executed Acquires and Releases • Reduces Acquire and Release Overhead • Disadvantage: May Introduce False Exclusion • Multiple Processors Attempt to Acquire Same Lock • Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

L.acquire() L.release() L.acquire() L.release() L.acquire() • • • L.release() L.acquire() L.release() L.acquire() L.release() False Exclusion False Exclusion Original After Lock Coarsening

Lock Coarsening Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Multiple Lock Coarsening Policies • Original: Never Coarsen Granularity • Bounded: Coarsen Granularity Only Within Cycle-Free Subgraphs of ICFG • Aggressive: Always Coarsen Granularity

Choosing Best Policy • Best Lock Coarsening Policy May Depend On • Topology of Data Structures • Dynamic Schedule Of Computation • Information Required to Choose Best Policy Unavailable at Compile Time • Complications • Different Phases May Have Different Best Policy • In Same Phase, Best Policy May Change Over Time

Code Version Original Bounded Aggressive Aggressive Original Overhead Time Sampling Phase Production Phase Sampling Phase Solution: Dynamic Feedback • Generated Code Executes • Sampling Phases: Measure Performance of Different Policies • Production Phases : Use Best Policy From Sampling Phase • Periodically Resample to Discover Best Policy Changes

Guaranteed Performance Bounds • Assumptions: • Overhead Changes Bounded by Exponential Decay Functions • Worst Case Scenario: • No Useful Work During Sampling Phase • Sampled Overheads Are Same For All Versions • Overhead of Selected Version Increases at Maximum Rate • Overhead of Other Versions Decreases at Maximum Rate Overhead V0 Time S S S P

T T T Work - Work Š T  i j i Work = 1P+SN (1 - o1(t)) dt P+SN P P+SN Work - Work Š (P+SN)  opt 0 opt Guaranteed Performance Bound Definition 1. Policy p is at Most  Worse Than Policy p over a Time Interval T if i j Work = 0T (1 - oi(t)) dt where Definition 2. Dynamic Feedback is at Most  Worse Than the Optimal if where Result 1. To Guarantee this Bound (1 - ) P + (1/) e(-P) Š (- 1) SN + (1/)

Guaranteed Performance Bounds (1 - ) P + (1/) e(-P) (- 1) SN + (1/) Constraint Values Feasible Region Production Interval P Production Interval Too Short: Unable to Amortize Sampling Overhead Production Interval Too Long: May Execute Suboptimal Policy for Long Time Basic Constraint: Decay Rate () Must be Small Enough

Dynamic Feedback: Implementation • Code Generation • Measuring Policy Overhead • Interval Selection • Interval Expiration • Policy Switch

Code Generation • Statically Generate Different Code Versions for Each Policy • Alternative: Dynamic Code Generation • Advantages of Static Code Generation: • Simplicity of Implementation • Fast Policy Switching • Potential Drawback of Static Code Generation • Code Size (In Practice Not a Problem)

Measuring Policy Overhead • Sources of Overhead • Locking Overhead • Waiting Overhead • Compute Locking Overhead • Count Number of Executed Acquire/Release Constructs • Estimate Waiting Overhead • Count Number of Spins on Locks Waiting to be Released ( ( ) ) Number of Spins Number of Acquire/Release Acquire/Release Execution Time x + x Spin Time Sampled Overhead = Sampling Time

Interval Selection and Expiration • Fixed Interval Values • Sampling Interval: 10 milliseconds • Production Interval: 10 seconds • Good Results for Wide Range of Interval Values • Polling Code for Expiration Detection • Location: Back Edges of Parallel Loop • Advantage: Low Overhead • Disadvantage: Potential Interaction with Iteration Size Atomic Operations Polling Points

Policy Switch • Synchronous • Processors Poll Timer to Detect Interval Expiration • Barrier At End of Each Interval • Advantages: • Consistent Transitions • Clean Overhead Measurements • Disadvantages: • Need to Synchronize All Processors • Potential Idle Time At Barrier

Experimental Results • Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] • Set of Complete Scientific Applications • Barnes-Hut N-Body Solver (1500 lines of C++) • Liquid Water Simulation Code (1850 lines of C++) • Seismic Modeling String Code (2050 lines of C++) • Different Lock Coarsening Policies • Dynamic Feedback • Performance on Stanford DASH Multiprocessor

60 60 60 Dynamic Dynamic Original Original 40 40 40 Dynamic Serial Serial Size Text Segment (Kbytes) Size Text Segment (Kbytes) Original Size Text Segment (Kbytes) Serial 20 20 20 0 0 0 Barnes-Hut Water String Code Sizes

60 40 Original Percentage Lock Overhead 20 Bounded Aggressive 0 Barnes-Hut (16K Particles) Lock Overhead Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks 60 60 40 40 Percentage Lock Overhead Percentage Lock Overhead 20 20 Original Bounded Original Aggressive 0 0 Aggressive String (Big Well Model) Water (512 Molecules)

Aggressive Bounded Original Contention Overhead Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 100 100 75 75 75 50 50 50 Contention Percentage 25 25 25 0 0 0 0 4 8 12 16 0 4 8 12 16 0 4 8 12 16 Processors Processors Processors Barnes-Hut (16K Particles) Water (512 Molecules) String (Big Well Model)

Ideal Performance Results: Barnes-Hut 16 Aggressive Dynamic 12 Feedback Bounded Speedup 8 Original 4 0 0 4 8 12 16 Number of Processors Barnes-Hut on DASH (16K Particles)

Ideal Bounded Dynamic Feedback Original Aggressive Performance Results: Water 16 12 Speedup 8 4 0 0 4 8 12 16 Number of Processors Water on DASH (512 Molecules)

Ideal Original Dynamic Feedback Aggressive Performance Results: String 16 12 Speedup 8 4 0 0 4 8 12 16 Number of Processors String on DASH (Big Well Model)

Summary • Code Size Is Not An Issue • Lock Coarsening Has Significant Performance Impact • Best Lock Coarsening Policy Varies With Application • Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy

Related Work • Adaptive Execution Techniques (Saavedra Park:PACT96) • Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94) • Dynamic Code Generation (Engler:PLDI96) • Profiling (Brewer:PPoPP95) • Synchronization Optimizations (Plevyak et al:POPL95)

Conclusions • Dynamic Feedback • Generated Code Adapts to Different Execution Environments • Integration with Parallelizing Compiler • Irregular Object-Based Programs • Pointer-Based Linked Data Structures • Commutativity Analysis • Evaluation with Three Complete Applications • Performance Comparable to Best Hand-Tuned Optimization

BACKUP SLIDES

16 Ideal 14 Aggressive Bounded 12 Original 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results : Barnes-Hut Speedup Barnes-Hut (16K Particles)

16 Ideal Bounded 14 12 Original Aggressive 10 Speedup 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: Water Water (512 Molecules)

16 Ideal 14 Original 12 Aggressive 10 8 Speedup 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: String String (Big Well Model)

Policy Switch Timer Expires Policy 1 Timer Expires Policy 2

Motivation Challenges: • Match Best Implementation to Environment • Heterogeneous and Mobile Systems Goal: • Develop Mechanisms to Support Code that Adapts to Environment Characteristics Technique: • Dynamic Feedback

Overhead for Barnes-Hut 0.5 0.4 Original 0.3 Sampled Overhead Bounded 0.2 0.1 Aggressive 0 0 5 10 15 20 25 Execution Time (Seconds) Barnes-Hut on DASH (8 Processors) FORCES Loop Data Set - 16K Particles

0.5 0.4 0.3 Sampled Overhead 0.2 Original 0.1 Bounded 0 0 10 20 30 40 50 60 Execution Time (Seconds) Overhead for Water Water on DASH (8 Processors) INTERF Loop Data Set - 512 Molecules

1 Aggressive 0.8 0.6 Sampled Overhead 0.4 0.2 Original 0 0 10 20 30 40 50 60 Execution Time (Seconds) Overhead for Water Water on DASH (8 Processors) POTENG Loop Data Set - 512 Molecules

1 Aggressive 0.8 0.6 Sampled Overhead 0.4 0.2 Original 0 0 100 200 300 400 500 Execution Time (Seconds) Overhead for String String on DASH (8 Processors) PROJFWD Loop Data Set -Big Well

Code Version Aggressive Bounded Original Aggressive Overhead Time Sampling Phase Production Phase Sampling Phase Dynamic Feedback

Dynamic Feedback: An Effective Technique for Adaptive Computing

Dynamic Feedback: An Effective Technique for Adaptive Computing

Presentation Transcript

Effective Feedback

Effective Feedback

EFFECTIVE FEEDBACK

ASSESSMENT FOR LEARNING Effective Feedback

Dynamic Feedback: An Effective Technique for Adaptive Computing

Effective Feedback

Effective Feedback for Improved Performance

Effective Feedback

Effective Feedback

An Effective Dynamic Analysis for Detecting Generalized Deadlocks

For feedback to be effective …

Adaptive Teaching: An Effective Approach for Learner-Centric Classrooms

Adaptive Grid Computing

EFFECTIVE FEEDBACK

Adaptive Computing

Adaptive Offloading for Pervasive Computing

A Reconfigurable Functional Unit for an Adaptive Dynamic Extensible Processor

EFFECTIVE FEEDBACK

Dynamic Programming Technique

Effective Feedback for Learning

Effective Feedback

Dynamic Programming Technique