170 likes | 187 Views
This research presents a method utilizing neuroevolution for dynamic resource allocation on chip multiprocessors. As the use of multiple cores increases and the latency between memory and cache grows, efficient management of on-chip resources like cache memory is crucial for optimal performance. The proposed solution involves a neural network controller, evolved using the Enforced Subpopulations algorithm, to re-assign cache banks to cores based on the CMP state information. Evaluations and recombination cycles refine the controller, and simulations based on SPEC2000 benchmarks show a 16% average improvement over baseline performance. The evolved network demonstrates a 13% advantage in generalization tests, outperforming the baseline consistently. The study accounts for cache bank reassignments in a real CMP, providing valuable insights into dynamic resource management strategies.
E N D
A Neuroevolution Method for Dynamic Resource Allocation on a Chip Multiprocessor Faustino J. Gomez, Doug Burger, and Risto Miikkulainen Presented by: Harman Patial Dept of Computer & Information Sciences University of Delaware
Background • Multiple core are becoming the norm. • Latency between main memory and fastest cache are becoming larger. • For best performance. • Dynamic management of the on-chip resource like the cache memory. • Dynamic management of the off-chip resource like the bandwidth.
Solution • A Controller that would use the CMP state info. to periodically re-assign cache banks to the cores. • CMP Resource Management, using a neuroevolution algorithm called Enforced Subpopulations(ESP). • Enforced Subpopulations extends the Symbiotic, Adaptive Neuroevolution Algorithm(SANE).
Evolving CMP Controller • Restrict the problem to evolve a recurrent neural network to manage the L2 Cache of a CMP with C cores. • Evaluation – A set of u neurons is selected randomely, one from each subpopulation, and combined to form a neural network. • Recombination – The average of each neuron is calculated by dividing its cumulative fitness by the number of trials in which it participated. • The Evaluation Recombination cycle is repeated until a network that perform sufficiently well is found
Evolving CMP Controller • The Input layer receives Instruction per cycle(IPC), L1 misses and L2 misses as input. • Because the networks are recurrent, input also consists of previous hidden layer activation for a total of 3C + u. • The output layer is one unit per core whose activation value is the amount of cache desired by the core.
Simulation Environment • Controllers were evolved in an approximation to the CMP environment that relies on traces collected from SimpleScalar processor simulator. • A set of traces were developed for the following SPEC2000 benchmarks: art, equake, gcc, gzip, parser, perlbmk, vpr. • Each Benchmark Trace set consists of one trace for each possible L2 Cache size
Simulation Environment • All Traces recorded the IPC, L1m and L2m of the simulated processor every 100,000 instructions using the DEC Alpha 21264 configuration. • By combining n traces we can approximate a CMP with n processing cores.
Network Evaluation • Networks are Evaluated by having them interact with the trace-based environment for some fixed number of control decisions. • In the starting the environment is initialized by selecting a set of C benchmarks, and allocating an equal amount of L2 cache to each core(Total L2/C)
Network Evaluation • When a trace runs out, the environment switches to the trace of a different benchmark at the same cache size. • 7 possible cache size and 7 benchmarks, a total of 49 traces were used to implement to environment. • The fitness of the network was the average IPC of the chip averaged over the duration of the trial.
Network Evaluation • In a real CMP, reassignment of a cache bank from core A to core B will cause entire cache of A and B to be unavailable for a significant time. • In our simulation we ignore this overhead and simply reconfigure the chip by switching to the trace corresponding to the new cache size. • The new trace starts at the same point where the previous one left.
Results • Five Simulations were run on a 14-processor Sun Ultra Enterprise for approximately 1000 generations each. • At the end of each evolution the fitness of the best network is compared with the baseline performance. • The network showed an average improvement of 16% over the baseline.
Results • The best network from each of the 5 simulations were submitted to a generalization test. • The test consists of 1000 trials where the network controls the chip for 1 billion instructions under random initial conditions. • The network still retained a 13% average performance advantage over the baseline and more importantly the network performed better that the baseline on every trial.
QUESTIONS QUESTION ?