340 likes | 433 Views
Scalably verifiable dynamic power management. Opeoluwa Matthews, Meng Zhang, and Daniel J. Sorin 20th International Symposium on High Performance Computer Architecture (HPCA) Orlando, Florida, February 17-19, 2014. - Krishnaprasad K and Yashas Krishna. Some Background.
E N D
Scalably verifiable dynamic power management Opeoluwa Matthews, Meng Zhang, and Daniel J. Sorin 20th International Symposium on High Performance Computer Architecture (HPCA) Orlando, Florida, February 17-19, 2014 - Krishnaprasad K and Yashas Krishna
Some Background • Current day biggest problem • Power Management • Managing power each Component gets • When power is given • How system gets power when needed • Etc .. • Power management • Static Power Management • Pre allocate power to each component • Dynamic Power Management • Allocate power when needed • Eg : Dynamic Voltage / frequency scaling
Problems with DPM • Designing DPM is Difficult • Because of Increasing scale of Computer Systems • Cores / Processor increases • Processors /System Increasing • Challenge to efficient DPM: • Scalability • Scalable to large-scale systems • Verifiability • Verify correctness in all situations • Scalability affects Verifiability • But no automated methods to Verify DPM
Important Factors in DOM • Scalability Factor • Scalability proportional to Power Consumption • High Scale = High Power Req. • Low Scale = Low Power Req. • Verification Of DPM and Benefits • Find Bugs in DPM • To Prove Correctness of DPM • If not Done : • Component Overheat • System Failure and Damage • So a Scalably verifiable DPM is needed
Contents • Existing System Model and Issues • Introducing new DPM system : Fractal DPM • How verification possible in new System ? • Fractal DPM vs Performance : Tradeoffs • New System Evaluation • Implementation Strategy • Comparison to Prior works • Conclusion
Initial System Model • DPM Model • Dynamically allocate power to each component Ci • Power Allotted proportional to Current performance Xi • Xi = function of ( Current power Allocation Pi & Current unconstrained perf (Xmaxi).) • Initial Setting : • Set a power Budget • Allot power to Components satisfying Budget • Maximize Xi • Sum(Pi) < Budget • Power Performance Model • 5 possible power settings for each Ci • Low ( L) • Medium_Low (ML) • Medium (M) • Medium_High (MH) • High ( H )
Initial Model : Issues • Design Using Existing tools • Fully automated Formal verification Methodologies • Tool : MurΦ Model Checker • Exhaustive State Space search • Checks Invariant Satisfied or not • Issue : State Space Explosion problems • As Ci increase : States Increase • Infeasible to traverse all states • For Eg: 5 C and 5 setting means 5^5 states • Typical Solution: • Check for small scale and if satisfied , assume Large scale also satisfies • Need not be true always
Fractal DPM Design • Fractal Design • A design in which system behaves the same at every scale • This makes Inductive verification possible • Base case: Verify that the minimum system satisfies its power constraints • Inductive step: Verify that larger systems are equivalent to smaller systems • Both done Using MurΦ
Fractal System organization • Hierarchical Structure : Binary tree model • Leaves : Computing Resource ( CR ) • Intermediate Nodes : DPM Controllers • Records Power states of Child Nodes • Handles power requests of CRs • Power Requests • CR can request more power • Sending req to DMP controller ( Parent ) • DMP Controller Responds • Either directly • Or Passing the req to Its parent Controller • A DMP Controller and Its Two Child considered a single “Node” like a Single CR • Each such Node has a combined Power Setting • Average of Child Nodes L:R
Fractal System organization • Eg : If Child are H and L , then average is MH • L:R format represents power setting of Left child : power setting of right child
Fractal Power Invariant • The Invariant Must be fractal • Applicable on all scales of System • Plus point of Fractal DPM : makes its unique from other DPMs • Fractal Invariant • It is impossible for both children of a DPM controller to be at the High power setting at the same time • Why? • Good for cases when Sum(Pi) > Budget • Limits System Wide power consumption • Limitation • Other Invariants are not considered or Compared : Future Work
Fractal DPM : Specification • Table based specification Method • Each entry in the table corresponds to a state/event combination, and the entry specifies what happens in that situation.
Specification Continued • Special States : • Pend-* • family of pending states in which the computing resource has requested a new power state and is waiting for a response • Block-* • family includes states such as block-L:ML, in which the DPM controller granted or denied a request to a child and is blocked waiting on the Ackfrom the child and will then go to state L:ML • Specification Of root DPM • Same as Non Root DPM except Root has no parent DPM to request power • No Pending States , Only Block States • Non root DPM passes to parent DPM only if : • It handles req by itself ( but Node state unchanged ) • 4 Exceptions : Invariant not satisfied
Fractal DPM : Scalability Issues • When High Scalability • Tree height Increase • Request from leaves to root take more time • Latency Issues • More hops • Possible Solution • Multi Degree Tree : Reduces Height of Tree • Prob : MurΦ doesn’t support this ; Couldn't verify • Scalability Issues : No big Concern • latency of DPM itself is not critical. • many requests can be satisfied without traveling far up the tree • Experimental results on a real system (modestly sized system (16 computing resources)) • latencies are reasonable.
Verification of Fractal DPM • Scalably Verify • Verification Effort : Independent of number of CR • Steps • Base Case Verification • Induction Step Verification • Base Case :Minimum System verification • Base system must be complete • Include all basic components • Incomplete base system • When some elements not considered • Gives incomplete verification : Spurious Actions • MurΦ verifies whether Invariants satisfied
Verification of Fractal DPM • Inductive Step : Equivalence Verification • Observation Equivalence verification chosen • Only outside behavior of system of diff. scale considered • No internal Actions considered • Considers only how system reacts to inputs • Two Perspectives • Looking Down • When system scaled Downwards • Looking Up • When system scaled Upwards • In both case , verify the larger system behaves same as sub system . • Tool : MurΦis used • Using same tool for both steps decrease transitional errors • On-The-Fly Mode : No extra state space
Power management Efficiency • System wide power consumption : upper bounded • Max power consumed : ( C-1) MH + H • As C approach Infinity • Max Average power of CR = MH • F-DPM allows all CR to be in MH • Do not permit certain cases • Causes Inefficiency But Tradeoff between this and Fractal Invariance • But Rare and Inefficiency caused is small • Another Inefficiency : F-DPM forces on CR of H to MH
Evaluation of System • Goal • Fractal DPM actually does its Job well ? • In allocating power to CRs Dynamically and Efficiently • Simulation Methodology • Dynamically set Xmaxi to all CRs • Keep it changing at Time steps • Give weights to power settings • Model behavior of CRs and DPMCs • Specification Tables • Computes performance of each CR • Function of power it is granted by DPM per Time Steps
Performance Modeling • How determine performance of a given CR at a given power setting ? • Each CR can use power different way • May achieve different performance at same setting • Abstract way : as a function of Pi and Xmaxi • Two Functions : • Perf1: • Decreasing marginal performance benefit • E.g. using more power to enable a faster core clock frequency helps performance but eventually performance becomes memory-bound • Perf2: • Linear Performance benefit • E.g. ideal voltage/frequency scaling
Performance Comparison and Results • Compare Against Implementable Oracle ( Ideal DPM) • Gives best possible allocations , even H:H allocations • Results ( give #CRs = 8) : • In majority of the time steps (>72%) : performance(FDPM) = performance(Oracle) • the performance gap is never more than 37% for perf1 and 46% for perf2 • Performance difference greater for Perf2 • perf2 models greater performance at higher power states, and thus being at a lower power state (to maintain the fractal invariant) is somewhat more costly • Thus : amount of performance sacrificed = Small
Implementation Strategy • Dynamic Voltage/Frequency Scaling as Power adjustment strategy • V/F adjusted on a core-pair ( Granularity ) • Possible because of fractal structure • CR and DPMC using Linux Daemons • Communication through Sockets • Optimization : OptiFDPM • CR re-requests next lower power setting if current request rejected • Optimized version holds scalable verifiability of FDPM
Evaluation of Implementation • Compare the power and performance of fractal DPM against an un-implementable oracle DPM scheme that always assigns the optimal power levels to core pairs. • Compare the power and performance of fractal DPM against a provably correct power management scheme that statically sets all cores to a given power level. • Determine the latency to service requests for new power levels
Evaluation of Implementation • Comparison to Oracle Power Management
Evaluation of Implementation • Comparison to Static Power Management
Evaluation of Implementation • Latency
Comparison : Previous Works • Lungu et al.’s research on verifiable DPM for multicore processors [9] • Observed DPM schemes cannot be verified on Large Scale • Showed State space explosion • Zhang et al.’s works on Fractal Coherence [14] • Derived idea of Fractal design • First time used for DPM • Others Works on DMP [10][8][6] • Did not use Verification
Conclusion • Design of Scalably verifiable DPM • Using Fractal Design for Verifiability • Small performance in efficiency only • Par with Oracle Model
Reference • [1] D. Bergamini, N. Descoubes, C. Joubert, and R. Mateescu, “BISIMULATOR: A Modular Tool for On-the-Fly Equivalence Checking,” in Proceedings of TACAS’05, volume 3440 of LNCS, 2005, pp. 581–585. • [2] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2008. • [3] C.-T. Chou, P. Mannava, and S. Park, “A Simple Method for Parameterized Verification of Cache Coherence Protocols,” in Formal Methods in Computer-Aided Design, 2004, pp. 382–398. • [4] G. Dhiman, K. K. Pusukuri, and T. Rosing, “Analysis of Dynamic Voltage Scaling for System Level Energy Management,” in Proceedings of the 2008 Conference on Power Aware Computing and Systems, 2008. • [5] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang, “Protocol Verification as a Hardware Design Aid,” in IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1992, pp. 522–525.
Reference • [6] A. Efthymiou and J. D. Garside, “Adaptive Pipeline Depth Control for Processor Power-Management,” in Proceedings of the IEEE International Conference on Computer Design, 2002. • [7] J.-C. Fernandez, H. Garavel, A. Kerbrat, L. Mounier, R. Mateescu, and M. Sighireanu, “CADP - A Protocol Validation and Verification Toolbox,” in Proceedings of the 8th International Conference on Computer Aided Verification, 1996, pp. 437–440. • [8] C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, “An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. • [9] A. Lungu, P. Bose, D. J. Sorin, S. German, and G. Janssen, “Multicore Power Management: Ensuring Robustness via Early-Stage Formal Verification,” in Proceedings of the Seventh ACM-IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), 2009. • [10] R. Maro, Y. Bai, and R. I. Bahar, “Dynamically Reconfiguring Processor Resources to Reduce Power Consumption in High-Performance Processors,” in Proceedings of the Workshop on Power-Aware Computer Systems, pp. 97–111, Nov. 2000.
Reference • [11] S. Park, S. Das, and D. L. Dill, “Automatic Checking of Aggregation Abstractions Through State Enumeration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 1202–1210, Nov. 2006. • [12] S. Park and D. L. Dill, “Verification of FLASH Cache Coherence Protocol by Aggregation of Distributed Transactions,” in Proceedings of the Eighth ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 288–296. • [13] D. J. Sorin, M. Plakal, M. D. Hill, A. E. Condon, M. M. K. Martin, and D. A. Wood, “Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol,” IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 556–578, Jun. 2002. • [14] M. Zhang, A. R. Lebeck, and D. J. Sorin, “Fractal Coherence: Scalably Verifiable Cache Coherence,” in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture2010.