1 / 49

Back to the Moon

Back to the Moon. The Verification of a Small Microprocessor's Logic Design. A Small Microprocessor for What?. Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments One of these is a Laser Altimeter, hence the name “LOLA”

nuru
Download Presentation

Back to the Moon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Back to the Moon The Verification of aSmall Microprocessor'sLogic Design Hugh Blair-Smith, NASA Office of Logic Design

  2. A Small Microprocessor for What? • Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments • One of these is a Laser Altimeter, hence the name “LOLA” • Laser altimetry produces very detailed and precise geodetic maps to aid establishment of a permanent base (why not “selenodetic”??) • Each instrument has at least one embedded control microprocessor Hugh Blair-Smith, NASA Office of Logic Design

  3. Microprocessor design criteria • Radiation hardening for high endurance • High performance to stay a step ahead of a rapid-cycling instrument • Simple well-understood architecture • Appropriate to embedded controller paradigm • No hardware multiply or divide required • Simple programs are reliable programs • Straightforward assembly-language or C programming—no operating system! Hugh Blair-Smith, NASA Office of Logic Design

  4. Hugh Blair-Smith, NASA Office of Logic Design

  5. Technology & Architecture • Gate arrays fulfill the criteria and support any desired architecture • We created the “80k85”—what’s that? • Based quite closely on the old Intel 8085! • Simple instructions assure quick interrupt response • Not RISC, but uses limited real estate of gates • An instruction set of known “completeness” • Availability of established tools • Assemblers, simulators, C compiler • Exploits skill set of embedded-controller artisans • Unimplemented op codes cause a special trap Hugh Blair-Smith, NASA Office of Logic Design

  6. Hugh Blair-Smith, NASA Office of Logic Design

  7. Processor Design Verification • Every processor design needs verification • Even the best have stumbled on this point! • IBM System 4π AP-101 (Space Shuttle GPC) • 4πs had been used in earlier aircraft & spacecraft • AP-101 was a special variant for the Shuttle • Not quite as off-the-shelf as everyone wanted to believe • Intel Pentium FPU for P6 core, 1994 • A determined effort to speed up long floating divide • Verifying high-precision arithmetic is a challenge! • http://www.maa.org/mathland/mathland_5_12.html • 1802 Microprocessor (1986) register interaction Hugh Blair-Smith, NASA Office of Logic Design

  8. IBM AP-101 Long Divide • Floating point arithmetic was specified to match results obtained by System 360 • Remove doubts about fidelity of GPC results to those of 360s in JSC control center • Original divide design too slow for Shuttle use • Designed, in conservative TTL circuit technology, to be interruptible after development of each quotient bit • Last-minute redesign (AP-101B) solved the problem • Much later “improved” AP-101S divide not well verified! • http://klabs.org/richcontent/software_content/hal_s/hal-s_compiler_system_specification.pdf • 3-16: “DED and DEDR instructions are broken on the AP-101S” • 6-12: “I2DEDR was substituted for DEDR in DMOD in order to avoid incorrect results caused by some inputs. See CR11164 and DR106660.” Hugh Blair-Smith, NASA Office of Logic Design

  9. IBM AP-101 Long Divide Cont’d • DED: Double Exponential (floating) Divide • DEDR: same but gets divisor from register • Both work for most inputs, but … • “Difficult to define” which inputs don’t work(!!) • However, OK if low word of divisor = 0 (D’oh!) • DMOD (remainder “modulo” function): • Is only user of these instructions, per audit • All uses OK, per the above “however” rule • I2DEDR substituted anyway, just in case Hugh Blair-Smith, NASA Office of Logic Design

  10. IBM AP-101 Long Divide Cont’d • More general remedies: • Modify compiler to avoid DED and DEDR • Document problem in Principles of Operation • Conclusion with Nasty Suspicion: • Did “Process” fail to operate at proper time? • Developers may have found and worked around the problem without generating DR • How can all that vector/matrix code never divide?? • DR, audit, etc. may have been “after the fact” Hugh Blair-Smith, NASA Office of Logic Design

  11. Intel Pentium Long Divide • Moore’s Law works better for component density than for processing speed • Complex special-casing, with table look-ups, for certain ranges of input values • Intel failed to proof-read a table in a PLA! • Verification of combinations of high-precision numbers cannot be exhaustive • Even if Intel could have tested one combination of input values every microsecond, the exhaustive test would take O(10^30) years (cf. age of universe = O(10^10) years) Hugh Blair-Smith, NASA Office of Logic Design

  12. 1802 Microprocessor (1986) • Not a logic design problem • High byte of a register sometimes writes over high byte of Program Counter • Dependent on electrical design factors • Voltage and temperature toleration • Length of polysilicon lines • Presence of many ones in other registers • A program like Smalley3 could have exposed it (looping through voltage & temp ranges) Hugh Blair-Smith, NASA Office of Logic Design

  13. The 80k85 Verification Challenge • Two words: “Rigorous” and “Thorough” • Exhaustive inputs test almost possible • 8085/80k85 word length is only 8 bits, but: • The 16-bit precision inputs to instruction DAD (Double-precision register add) would take days to execute an exhaustive test • That’s 2^32 combinations, O(10^10) • So why not “suck it up” and spend the days? • A third word: “Looping” (for margin testing) Hugh Blair-Smith, NASA Office of Logic Design

  14. A Historical Parallel from Apollo • The Block I and Block II Apollo Guidance Computers (AGC) each needed one self-test program for two purposes • Enhancement of manual design verification • Assurance that all features are still working • Ed Smalley of MIT Instrumentation Lab wrote those two programs • Some feedback to design: inclusion of an instruction to perform interrupt (EDRUPT) Hugh Blair-Smith, NASA Office of Logic Design

  15. Hugh Blair-Smith, NASA Office of Logic Design

  16. Exploiting the Parallel Further • Like the AGC models when Ed Smalley began his two tasks, the 80k85 was not quite a “newborn” when I began mine • Both machines had a considerable track record of executing a few programs correctly • All we needed was “rigorous” and “thorough” • In Ed’s honor, I named my 80k85 self-check program Smalley3 Hugh Blair-Smith, NASA Office of Logic Design

  17. Overview of 80k85 Architecture • Addressing by byte (65,536 bytes of RAM) • Central registers and register pairs: • Accumulator A • 4 general registers B,C,D,E, sometimes as 2 pairs • 2 indirect addressing (or general) registers H,L • Program Counter PC and Stack Pointer SP (pairs) • Special: condition flags; interrupt mask • Accumulator and flags sometimes function as a register pair called Program Status Word (PSW) • 256 one-byte I/O ports: 128 input and 128 output Hugh Blair-Smith, NASA Office of Logic Design

  18. Overview of 80k85 Instructions • First (often only) byte divided by Huffman coding into as many as 3 fields • Extreme case: MOV with 2-bit op, 3-bit destination tag, and 3-bit source tag—56 nontrivial functionalities • Ignoring those subdivisions, 245 valid ops • All the valid 8085 ops except DAA (Decimal Adjust) • Interrupt masking feature of 8085 omitted • 70 distinct instructions functionally • Four interrupts • All but one of the 8085 interrupts are implemented • All interrupts have the same priority (unlike 8085) Hugh Blair-Smith, NASA Office of Logic Design

  19. Phased Development Plan • Objective: capability to test some ops before complete test is ready • Generally, early releases of Smalley3 tested simpler instructions • Later ones: more complex or involving parts of 80k85 design still subject to change, especially I/O ports • However, no rule that each instruction has to be tested using only simpler ones • Couldn’t achieve that rigorously anyway • Final phase: general RAM corruption detector Hugh Blair-Smith, NASA Office of Logic Design

  20. Functional Groups of Instructions • NOP and single-byte transfers • Double-byte transfers • Single-byte arithmetic binary operations • Double-byte arithmetic binary operations • Single-byte Boolean binary operations • Assorted unary operations • Transfers of control (except HLT) • Stack operations • Data input & output operations • Interrupt management and illegal op codes Hugh Blair-Smith, NASA Office of Logic Design

  21. Top-Level Design of Smalley3 • Perform entire test “cycle” just once, or: • Stated number of times • Indefinitely (until failure or manually stopped) • A Test Cycle is any subset of the 10 functional groups • In each functional group, test any subset of its distinct ops • For each distinct op, test any subset of its “parametric variations” (defined by all 8 bits) • For each variation, test against 16 systematic data value sets and from 1 to 239 pseudo-random data sets • Run RAM corruption check at any of the above levels • … or just once, at end of run whether good or bad • Any failure stops the test and supports manual analysis • Random data is not the same for successive test cycles Hugh Blair-Smith, NASA Office of Logic Design

  22. Top-Level Design of Smalley3 (cont’d) • 128 bytes of input data placed in input ports by external test equipment at (fairly) regular intervals • But not predictable from “inside” the 80k85 • Each input port can be read in a (gulp) partially updated state • The four types of interrupt are commanded in turn by external test equipment at (fairly) regular intervals • But at truly random times as seen by Smalley3 Hugh Blair-Smith, NASA Office of Logic Design

  23. Systematic or Random Data Environment for Instructions • Current machine-state data set placed in all central & special registers (except PC), by pairs • Value for Stack Pointer restricted so as not to step on Smalley3’s code or scratch registers • Machine-state data also used for an address of a pair of bytes of RAM, and for contents thereof • Address value restricted to not step on stack, or on Smalley3’s code or scratch registers • Insofar as instructions refer to 1 or 2 bytes of RAM, they use these bytes and contents • Similarly for address of an I/O port and contents Hugh Blair-Smith, NASA Office of Logic Design

  24. Systematic Data Sets • 16 zeros to fill register pairs with all zeros • 16 ones to fill register pairs with all ones • 8 zeros and 8 ones for each register pair • 4 zeros, 4 ones, 4 zeros, 4 ones similarly • Alternating pairs of zeros and ones ditto • Alternating zero and one bits similarly • All these are mixed and matched to make 16 systematic data sets (distinct “interesting & edgy” combinations of 16 bits each) • Each systematic data set is placed in all register pairs Hugh Blair-Smith, NASA Office of Logic Design

  25. Pseudo-Random Data Sets • The pseudo-random number generator (PRNG) is an implementation in 80k85 code of an 8-bit linear feedback shift register (LFSR) • Special-case logic added to “avoid the lockup state” so that the PRNG cycles indefinitely through all 256 states of a byte in a non-trivial sequence • The same “Content Engine” routine that deals out the systematic data sets has a mode that uses the PRNG twice to deal out 16 bits of pseudo-random data • Unlike the systematic mode, each register pair set up gets a different data set of pseudo-random data Hugh Blair-Smith, NASA Office of Logic Design

  26. Instruction Test Pass/Fail Criteria • What does each instruction affect? • A small subset of registers and RAM • Mustn’t have any side effects • How to test both of these simultaneously? • Initialize, predict, and verify “machine state” • All central and special registers • Two bytes of RAM selected by data set • Same as top 2 bytes of stack wherever appropriate • One byte in I/O port selected by data set • Admittedly not the complete machine state Hugh Blair-Smith, NASA Office of Logic Design

  27. Initialize, Predict, and Verify • Scoping/initializing limited “machine state” • Use current data set as required to establish “PRE” and default (i.e., matching) “POST” state values • Predicting changes in machine state • Every parametric variation of every instruction has its own predictor routine to establish whatever different “POST” state values are required • Verifying both changes and non-changes • All “FOUND” state values seen after execution must match corresponding predicted “POST” state values Hugh Blair-Smith, NASA Office of Logic Design

  28. Principles of Prediction • As far as possible, make predictions for each functionally distinct instruction without using that instruction type • Table look-ups contribute to this solution • Addition/subtraction prediction (“Blackadder”) uses 256-byte tables aligned with addresses so that entering tables is done by setting L register only—no address addition involved • Boolean ops prediction uses loop to test each bit position in turn—no Boolean ops involved Hugh Blair-Smith, NASA Office of Logic Design

  29. Verification and Analysis Support • Objective is to save in RAM everything needed to identify the exact fault detected • PRE, POST, and FOUND values reside in organized patterns of RAM locations • “BADS” value for each state variable is XOR of FOUND and POST values, to highlight wrong bits • Tree of BADS values provides overall Go/No-Go state in one byte; some bits are tree branches “pointing” to other BADS • Root FBADS has a bit “Regs” meaning examine RBADS, each of whose bits shows which register’s BADS to examine • Next slide is a map of analysis support locations Hugh Blair-Smith, NASA Office of Logic Design

  30. Hugh Blair-Smith, NASA Office of Logic Design

  31. Special Note on “Testing” HLT • Regular testing of HLT impossible in a self-check program that runs indefinitely • But some of the operational modes do end • Random machine state, PRE and POST, is set up for these necessary HLTs • Test engineer can manually obtain actual final machine state and compare against POSTs • By varying initial random seed or other run parameters, test engineer can exercise different machine states for HLT Hugh Blair-Smith, NASA Office of Logic Design

  32. “Rigorous & Thorough” for I/O • Instruction testing doesn’t do much for I/O • Input ports can be written only by test equipment, read only by Smalley3 • Output ports can be written and read by Smalley3, and can be read by test equipment • But test equipment doesn’t read it critically • An approach used in some projects is for test equipment to wrap output ports back to input ports for self-check code to inspect • LOLA test equipment can’t do that directly Hugh Blair-Smith, NASA Office of Logic Design

  33. “Rigorous & Thorough” for Input • LOLA test equipment generates systematic and pseudo-random input data sets for input ports • When limited number of test cycles are run, only the pseudo-random input data sets are used • When multiple complete test cycles are run, systematic data is used, then pseudo-random • Systematic data is all-zeros and all-ones at present, but that leaves a coverage gap • 3 more data sets with mixtures of zeros and ones could detect any case of 2 bit positions in the data being wired into reversed bit positions of an input port • Whether to add these is a work in progress Hugh Blair-Smith, NASA Office of Logic Design

  34. Rigorous Input Testing Cont’d • Checking systematic input data is simple • Smalley3 knows the correct values a priori • Checking random input data is trickier • We impose a parity rule on each byte, and require a longitudinal XOR checksum • Honeywell 800/1800 (1960’s) did this with tape, achieving SEC-DED (“Orthotronic Control®”) • Parity rule is reversed between random data sets • Smalley3 identifies the bad port (if only one port is bad), and the bad bit within it (if only one bit is bad) Hugh Blair-Smith, NASA Office of Logic Design

  35. Rigorous Input Testing Cont’d • How does Smalley3 know when an input data set is completely resident in the input ports? • Checksum logic arbitrarily rules out checksum = 0 • Zero in the checksum port is a signal that the 128 input ports, as a class, are not in a stable state • Transition from zero to non-zero in the checksum port is a signal that the new data set is stable in all ports • (except in checksum port, but that settles long before use) • How does Smalley3 deal with possibility that a port can be read while partly updated? • A “bad” checksum is read again to see if it settles out to zero and should therefore be ignored Hugh Blair-Smith, NASA Office of Logic Design

  36. “Rigorous & Thorough” for Output • Smalley3 can verify that what it reads from an output port is what it just wrote there… • But can’t tell if wiring from the output port to the outside world is correct • Another work in progress: my proposal to do the I/O wrapback backwards(!) • Smalley3 would copy its verified input data to the corresponding output ports • Test equipment can remember what it put in the input ports, and could compare that to what it later reads from the output ports • On that point, the test equipment decides pass/fail Hugh Blair-Smith, NASA Office of Logic Design

  37. Testing of External Interrupts • Test equipment commands interrupts in a regular pattern, but their arrival looks truly random to Smalley3 • Primary objective: verify that progression of machine states commanded by Smalley3 is not corrupted by interrupt • Secondary objective: verify that each interrupt used its correct target location and saved PC correctly • In this architecture, interrupt is functionally the same as CALL—resume address is in the stack • Can’t use PRE/POST/FOUND paradigm for interrupts Hugh Blair-Smith, NASA Office of Logic Design

  38. RAM Corruption Detector • Principle is X-Y arrays of XOR checksums • 65k is conceived as 256 columns of 256 rows • 256 column sums of rows 0-253 form row 254 • 256 row sums form row 255 • Coverage of these checksums is all of RAM! • System and Smalley3 scratch registers • Smalley3 executable code • Leftover general-purpose RAM • Identifies any single bad byte, and the bad bit if there is only one • Also identifies bad row or column in some multi-byte errors Hugh Blair-Smith, NASA Office of Logic Design

  39. RAM Corruption Detector Design • Routine to check checksums uses only central registers, no RAM, for itself • Separate indirect-address registers H & L are crucial on this point • Time consumption restricts construction of checksums to just once (beginning of run) • All Smalley3’s scratch locations are quadruply allocated, 1 prime and 3 shadow locations • 1 shadow in same row, 1 shadow in same column, and 1 at intersection of those 2 shadows • When shadows are up to date, values don’t affect checksums Hugh Blair-Smith, NASA Office of Logic Design

  40. A True Confession • The considerable effort to include all those shadowed variables may not have been worth while • Since the shadowing prevents those variables from affecting the checksum, coverage is only marginally better than if checksums excluded Smalley3 variables • Still, the prologue to checksum checking does get some coverage from verifying that all 3 shadows are equal before updating them Hugh Blair-Smith, NASA Office of Logic Design

  41. Smalley3’s Achievements • Caught a bug in implementation of one op • CMP B (compare Accumulator vs. B register) • Identified a weak point elsewhere in chip • Fan-out excesses caused low-voltage tests to affect one particular 80k85 instruction • Smalley3’s alarm induced design engineer to scan non-80k85 parts of chip for the problem • Solution to that greatly increased undervolt toleration • Motivated upgrade of CPU-memory slew rate (speed) for additional margin during high temperature operation Hugh Blair-Smith, NASA Office of Logic Design

  42. Summary • Smalley3 occupies ~ 9.3 kbytes, of which: • 5.8 kbytes are executable code • 2.7 kbytes are tables • 1.1 kbytes are variables • Some executable code gets overlaid with variables • Smalley3 Test Cycle takes 14 sec if no RAM corruption checking (assuming 4 MIPS) • If RAM checking done at max frequency: 1.8 hour • Remember, the policy is to run many test cycles Hugh Blair-Smith, NASA Office of Logic Design

  43. Conclusions • Rigorous & thorough testing of a small 8-bit microprocessor with no complicated instructions wasn’t that all-fired simple • See later slide on Scalability to 16, 32, or 64 bits • Testing of Smalley3 itself was sort of easy • Simulator allowed it to be developed on a PC • Smalley3 bugs looked like 80k85 faults: convenient! • But fidelity of an 8085 simulator to the 80k85 was less than complete • Also, a full-bore run overwhelms simulator capacity Hugh Blair-Smith, NASA Office of Logic Design

  44. Conclusions Continued • Design of 80k85 and its test equipment could have reduced some complexities • Allow normal output-to-input wrapback • Eliminate possibility of reading half-baked port • Give self-check program more control over when inputs occur • But … would that be less realistic? • Greater fidelity of 80k85 instructions to 8085 where it doesn’t hurt 80k85 function Hugh Blair-Smith, NASA Office of Logic Design

  45. Smalley3 is Valuable • Smalley3 will occupy one of four pages of EEPROM • Enables in-flight testing of processor • Flight version will include more systematic RAM testing • Rapid bit reversals may detect “fatigue” modes • Pattern-Sensitive Fault testing “by the book” • Smalley3 is going to the Moon! Hugh Blair-Smith, NASA Office of Logic Design

  46. Scalability Considerations • How would 16-bit, 32-bit, and 64-bit processors be more of a challenge? • More, and more complex, instructions • Multiply, divide, floating point, trig, decimal, etc. • Base and index registers add much complexity • Vastly greater amounts of RAM to check • Even so, table look-up methods get impractical • OK, how might the challenge be eased? • Larger processors in hi-rel applications might have some fancier hardware Built-In Self-Test • Around 1970, we at MIT designed a SIRU (strapdown inertial reference unit) controller whose instructions calculated & compared direct and complement results Hugh Blair-Smith, NASA Office of Logic Design

  47. A Minority Opinion (Just My Own) on One Design Point • Allowing interrupts to do everything they do in large multi-tasking systems adds complexity to both mission and test code • Embedded control processors don’t need that much flexibility • They can sample inputs periodically to obtain the same information that interrupts provide • Ideally, that would use a program-readable clock • It’s not as if any of the code is raw product from a raw and loop-vulnerable beginner • Space Shuttle GPC synchronization took out most of the flexibility even at that level • Response to interrupts restricted to programmed sync points Hugh Blair-Smith, NASA Office of Logic Design

  48. QUESTIONS? • That’s why we reserved the room ’til 1 PM • Of course, there is the trade-off vs. lunch! Hugh Blair-Smith, NASA Office of Logic Design

  49. Hugh Blair-Smith, NASA Office of Logic Design

More Related