180 likes | 272 Views
MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting. Jonathan Sprinkle (University of Arizona), and Brandon Eames (Utah State University). Overview. Motivations Goals, Assumptions, and Constraints Approach Candidate algorithms/systems Experiment Setup Example
E N D
MultiCore Hardware Experiments in Software Producibility: Kickoff Meeting Jonathan Sprinkle (University of Arizona), andBrandon Eames (Utah State University)
Overview • Motivations • Goals, Assumptions, and Constraints • Approach • Candidate algorithms/systems • Experiment Setup Example • Hardware choices • Metrics and measuring • Plan • Timeline • Division of Labor • Questions MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Why are multi-core real-time systems different? Process 1 Process 1 Process 3 Process 1 Process 1 Process 3 Process 2 Process 2 Process 3 Process 1 Process 1 Process 4 Process 1 Process 1 Process 4 Process 2 Process 2 Process 4 Process 2 reads output from Process 1 here Process 2 Process 2 With interleaved threads, the processes acted nicely on a single core, but with multi core, threads must be synchronized. Process 3 Process 2 Process 3 Process 2 Process 3 Process 2 Process 2 Process 2 Process 2 reads output from Process 3 here Process 2 Process 2 Processes designed for distributed processing will work fine, but there may be some real-time tasks which “just work” for single-core systems due to the prevalence of “weak testing” (if it works, don’t try to fix it!). Process 2 Process 4 Process 4 reads output from Process 1 here Process 4 Process 4 time MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Motivations • Real-time systems are often subject to subtleties even, for single-core machines • Cache size, HDD access times, interrupt timings, etc., can all affect stability of the system if depended upon unwisely • A few of the possible problems: • Even though we only utilize a single thread, there may be a second core available, but not utilized by this application; however, other applications are now free to make shared resource conflicts • Synchronized threads, whose timing is okay on one processor, but • when using multiple cores, the processes execute too fast • when accessing shared resources, conflict occurs • Non-synchronized, multi-threaded processes, where interleaving of commands involving a third process slows down one process enough for stability, but without the third process, system is unstable MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Goals • Demonstrate potential performance gains of legacy software using multicore processors • with high-confidence of safe execution • with the ability to know whether there are dangers of stability • with side-by-side comparisons of executions using single-core processors • Produce exemplar experiments, where • measurements for the system are taken, • the system is composed using off-the-shelf components, and • documentation for how the experiment was performed is created, allowing someone else to duplicate the experiment • Give specific examples for testing • Data-in-the-loop (DIL) • Simulator/Software-in-the-loop (SWIL) • Hardware-in-the-loop (HWIL) Hokuyo Laser Sensor MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Assumptions and Constraints • New infrastructure will not be created for the project • New research in metrics and measurements are not expected • Only lightweight software will be written (configuration file, glue code, perhaps some variants of execution, but ~1000 lines, not ~100k lines). • Existing off-the-shelf software and middleware can be used for an engineering (and preferably DoD-related) application • Emphasis will be placed on open-source tools • Metrics and other measurements are possible with such tools • Existing simulators and data for the *-in-the-loop can be utilized • Hardware developed in related research programs may be used for HWIL, especially for testing highly-parallel algorithms to simulate future multi-core (core>>2) processors Real data from autonomous runs is available for use. http://playerstage.sourceforge.net/ MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Approach: Candidate Algorithms/Systems • Multi-component autonomous vehicle simulation • Simulation of vehicle dynamics • Simulation of environment (obstacles, other vehicles, etc.) • Real-time path-planning with obstacle avoidance (key algorithm, with multi-core extensions) • Currently encoded as a distributed system, but capable of simulation on a Core2 Duo Laptop in VmWare (so not exorbitantly slow!) • Advantages: • Existing data--using actual vehicle and trajectory experiments • Existing simulators--3D simulators, featuring hardware acceleration, capable of being turned off to simulate older processors, or to ‘hit’ the cache • Existing software--all components to run this demonstration already exist in open-source, permitting the free use for future persons wishing to run the experiments • Familiarity--Sprinkle was Team Leader for the research group who put together the software • Potential for multi-core acceleration (real-time path planner is key) • Potential for follow-on algorithm development (computer vision, etc.) MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Autonomous Vehicle Simulation and Experiments: basicsim For this simulation, all vehicle components use simulated information that comes from dedicated simulators. The components gridmap, faithlocaliser3d, dgclocalnav, and highlevelplanner, each run independently of the data source. The components laser{3,2,1}, Car, and imu retrieve data from simulators. In addition to gathering simulated data, these data-source components can replay data that has been logged. MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Autonomous Vehicle HWIL: vehiclecheck For this simulation, all vehicle components use simulated information that comes from hardware devices. The components gridmap, lanedetector dgclocalnav, and highlevelplanner, each run independently of the data source. The components laser{3,2,1}, Car, and insgps retrieve data from hardware (and store it in a log repository). Note that previous component faithlocaliser3d is not needed, since insgps provides localization information, In addition to gathering simulated data, these data-source components can replay data that has been logged. MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Why is this component-based design relevant? Simulators and data sources provide nondeterministic, and deterministic, data source comparisons for varying implementation platforms for these high-level algorithms. Thus, we can gather performance requirements of each, with controlled data. Finally, this data is, in many cases, gathered from the actual vehicle! Simulator 1 Algorithm(on single-core) Data Log 1 Algorithm(on multi-core) Simulator 2 Algorithm(on hardware) MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Approach: Hardware Choices • Multi-core processors: integrated processing power, shared + distributed memory architectures • Commodity processors widely available on the market • Performance impact not well understood • Commercial products: • Intel Core 2 Duo, Quad series • AMD Athlon X2, Phenom series • Basic idea: multiple processors on a single die • Major differences in memory subsystems • Intel: Symmetric dual core processors, 2 shared 2MB L2 caches • AMD: 4 individual cores, 4 individual 512 KB L2, 1 shared 2 MB L3 on-die, on-die DDR controller MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Approach: Metrics and Measuring • Goal: Understand the performance impacts of multicore processing through profiling and measurement • Relevant metrics: • Wall-clock execution time • Memory hierarchy profiles (L2 Cache misses, page faults) • Fine-grained timing (execution time per function) • Measurement approach • Profile-based analysis using available profiling tools • Standard hardware benchmark platforms • Single core multi-processor platform • Dual core uni-processor • Quad core uni-processor • Execution of selected software on each benchmark platform • Profiling tools to capture, per process • Level-2 cache misses • Number of page faults • Histograms of function call frequency, execution time • OS calls to measure wall-clock time MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Measurement Tools • Multiple profiling tools are either commercially or freely available • Many target only single-threaded programs • None implement all types of measurements simultaneously • GNU profiling tools • gcov, gprof • Function call histogram and timing, via sampling • Single-thread only (but workarounds exist for multithread profiling • Function coverage, branch execution frequencies • http://gcc.gnu.org/onlinedocs/gcc/Gcov.html • http://www.gnu.org/software/binutils/manual/gprof-2.9.1/gprof.html • Valgrind • Memcheck: detects erroneous memory usage, memory leaks • Cachegrind: detailed simulation of cache behavior (on both L1 and L2 caches) • Callgrind: callgraph analysis • Freely available, Linux based • Automatic (heavy) instrumentation of code • http://valgrind.org MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Measurement Tools • VTune • Intel-developed commercial performance measurement tool • Support for threads and Intel multicore processors • http://www.intel.com/cd/software/products/asmo-na/eng/239144.htm • Tau • Multiprocess profiling tool • Requires manual instrumentation of software • Freely available • http://www.cs.uoregon.edu/research/tau/ MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Example profile: gcov + gprof for H.264 video encoder • H.264 Encoder UMHex motion estimation algorithm • int UMHEXIntegerPelBlockMotionSearch (…) • … • iXMinNow = best_x; • iYMinNow = best_y; • for (m = 0; m < 4; m++) • { • cand_x = iXMinNow + Diamond_x[m]; • cand_y = iYMinNow + Diamond_y[m]; • SEARCH_ONE_PIXEL • } • function UMHEXIntegerPelBlockMotionSearch called 291600 returned 100% blocks executed 95% • … • 201488: 352:/*EOF*/ • 201488: 353:/*EOF*/ • 1007440: 354:/*EOF*/ • 201488: 354-block 0 • 805952: 354-block 1 • 201488: 354-block 2 MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Plan MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Division of Labor • Sprinkle • Experiment setup, including software choices • Sample experiments, including component choices • Write-ups for experiments, intended as a resource for future users • Eames • Profiling and measurements • Hardware choices, including optimization choices • Experiment performance, and writeup, based on “future user” write-ups by Sprinkle • Quantitative comparisons of single, dual, multi-core experiments MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting
Questions MultiCore Hardware Experiments in Software Producibility--Kickoff Meeting