290 likes | 297 Views
Explore how eMIPS processors extend functionalities dynamically on FPGA, enabling speedups in applications and real-time software debugging. Get insights into eMIPS, P2V verification, and M2V code optimization.
E N D
eMIPS Project Overviewi.e...What if your next computer was an FPGA? Alessandro Forin Microsoft Research, Redmond April 3, 2008
Outline • How FPGAs took over the world (architecture) • Easy pickings and novelty items (eMIPS) • Simulation, debugging, profiling, … (tools)
Field Programmable Gate Arrays are.. ..essentially two, super-imposed memory planes: • The routing plane which signal goes where • The logic plane what logic function a node computes Just like RAM, they are • slower (1/10 roughly) • power hungry Nonetheless, they took over the world.
One way is.. • UltraSparc CPU • 2.0 GHz, 4-issue,15 stages • Memory Hierarchy • 16KB/1MB L1/L2 • 32 Byte cacheline • 1/15 cycle L1/L2 latency • 160 cycle memory latency • Reconfigurable HW • Mapped to Xilinx Virtex-4 to obtain clock frequency Courtesy: K. Compton, U. Wisc.
The FPGA advantage • No code fetches • Fine-grained, spatially parallel • Data access: static prediction, reordering, variable width
Dynamically Extensible Processors • Using an FPGA, we have realized a (MIPS) processor that extends itself at runtime, using Extensions that are safe for multi-user operating systems • Applications: • Speedup execution using an Application-Specific CPU (M2V) • Unobtrusively monitor (real-time) software (P2V) • Loadable software debugging support (eBug) • Load/Unload peripherals at runtime, minimizing chip area • Load/Unload processor cores on demand • First release now available for non-commercial use
The eMIPS “Workstation” • Motherboard: Xilinx ML401 evaluation board for the Virtex4 FPGA • eMIPS is on the FPGA, just add keyboard, mouse and disk!
Application speedups (worst case) Video Games Spec2000 Real-Time Binaries with Hardware Acceleration Extended instructions are inserted into the Binaries. If the HW Extension is loaded the instruction executes and skips the basic block. Otherwise eMIPS interprets the instruction as a NOP and executes the block. Other code… Op78 sp,ra,10 New Instruction Lw ra,10(sp) Original Basic Jr ra Block Addiu sp,sp,18 Other code…
Assertion Based Verification with P2V • Use the IEEE-standard hardware Property Specification Language (PSL) to verify C (real-time) programs • Implement it using a simulator, or in reconfigurable hardware • PSL-to-Verilog compiler: creates Extensions from PSL code • zero instrumentation code and zero overhead!
Roles of eMIPS, P2V and GCC int foo(void){ REQ: device->CONTROL = 1; while(1) { ACK = device->STATUS; .... } } C GCC Elf-image Core Datapath debug info P2V Monitor Unit (MU) Bitfile PSL always(REQeventually(ACK==1))
Peripheral Configuration State Machine Extensible Peripherals • Use the eMIPS extension slot for I/O peripherals • Safely load/unload peripherals on demand • Saves area, forward-compatible, bug fixes, … • Flexible interface solves perf. and atomicity issues Absent Not Configured Suspended: Power Mgt. Run
eBug: the extensible debugger • Safe, in-process, JTAG-style software debugger • Extensible in hw (watchpoints) • Extensible in sw (communication protocols) • Use P2V as a trigger
Giano: Simulated Board ModelSim: eMIPS CPU Giano: Oracle Processor Debugging & Verification
Optimizing the ISA with M2V Hardware Designers Software Developers Profiling M2V Compiled Code Top one or two Basic Blocks Basic Blocks Implemented as Hardware Extensions Original Binaries Modified to utilize Hardware Acceleration Same speed, half the area of hand-generated Verilog code
The BBTools • BBFIND finds the basic blocks in MIPS, PPC and ARM images (ELF and PE++) • A simulator (Giano) uses the BB info to generate profiling information. BBSORT+BBDUMP print it • BBMATCH applies the new instructions to the original executables • The simulator generates the new profile data Execution Counts of Individual Basic Blocks in XQuake, on the Xbox360
Real-Time Simulation: Giano • Definition of Real-Time Simulation”: Realize a software system that matches the temporal behavior of the hardware+software system being simulated, using the same time-ordered sequence of inputs • Applicable to hybrid hw+sw simulators too • Requires: • Clock adaptation • I/O adaptation
ARM: At91m63200 FPGA: vvp.dll PLI plug-in (VPI) NamedPipe client NamedPipe “GIANO” PLI plug-in (VPI) Optional: external devices (LabView) C-models ModelSim V-models Xilinx: Spartan3 CPU module test; always @(posedge clock) counter = counter + 1; Start = TheCounter->Value; ...compute... End = TheCounter->Value; MEM I/O Optional: Icarus Verilog Interpreter
User Interface: Visio Graphs Atmel EB63 Evaluation Board
Clock Adaptation Problem: Output a character every second • Timer too slow/fast? Incorrect • Host load changes? Erratic Solution: Rate-limit the clock using introspection and adaptation
Rate-limiting the Clock • Every M (10**3) clock ticks spin idle for D microseconds • Every N (10**6) clock ticks check the actual frequency against the target frequency, adjust the delay D
I/O adaptation Problem: 9600 baud serial line • From disk trace? too fast • From user? too slow • From real serial line? depends Solution: • Link to rate-limited clock • Adapt using events and notifications
Giano: Key points • Giano is the first Real-Time Simulation Framework for hardware-software co-development • Uses Microsoft Visio as the graphing and execution UI • Configurations are Platform XML files • Nodes in the graph are separate, user-defined DLLs • Lots of functionality pre-built into the base framework • 60+ working modules, 20+ systems, 4 years internal use • Release V2 available, free for academic use
Credits • Full-timers Neil Pittman, Alessandro Forin • Interns Nathaniel Lynch, Behnam Neekzad, Ping Hang Cheung, Bharat Sukhwani, Lu Hong, Karl Meier, Giovanni Busonera http://research.microsoft.com/research/EmbeddedSystems