500 likes | 717 Views
Back to the Moon. The Verification of a Small Microprocessor's Logic Design. A Small Microprocessor for What?. Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments One of these is a Laser Altimeter, hence the name “LOLA”
E N D
Back to the Moon The Verification of aSmall Microprocessor'sLogic Design Hugh Blair-Smith, NASA Office of Logic Design
A Small Microprocessor for What? • Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments • One of these is a Laser Altimeter, hence the name “LOLA” • Laser altimetry produces very detailed and precise geodetic maps to aid establishment of a permanent base (why not “selenodetic”??) • Each instrument has at least one embedded control microprocessor Hugh Blair-Smith, NASA Office of Logic Design
Microprocessor design criteria • Radiation hardening for high endurance • High performance to stay a step ahead of a rapid-cycling instrument • Simple well-understood architecture • Appropriate to embedded controller paradigm • No hardware multiply or divide required • Simple programs are reliable programs • Straightforward assembly-language or C programming—no operating system! Hugh Blair-Smith, NASA Office of Logic Design
Technology & Architecture • Gate arrays fulfill the criteria and support any desired architecture • We created the “80k85”—what’s that? • Based quite closely on the old Intel 8085! • Simple instructions assure quick interrupt response • Not RISC, but uses limited real estate of gates • An instruction set of known “completeness” • Availability of established tools • Assemblers, simulators, C compiler • Exploits skill set of embedded-controller artisans • Unimplemented op codes cause a special trap Hugh Blair-Smith, NASA Office of Logic Design
Processor Design Verification • Every processor design needs verification • Even the best have stumbled on this point! • IBM System 4π AP-101 (Space Shuttle GPC) • 4πs had been used in earlier aircraft & spacecraft • AP-101 was a special variant for the Shuttle • Not quite as off-the-shelf as everyone wanted to believe • Intel Pentium FPU for P6 core, 1994 • A determined effort to speed up long floating divide • Verifying high-precision arithmetic is a challenge! • http://www.maa.org/mathland/mathland_5_12.html • 1802 Microprocessor (1986) register interaction Hugh Blair-Smith, NASA Office of Logic Design
IBM AP-101 Long Divide • Floating point arithmetic was specified to match results obtained by System 360 • Remove doubts about fidelity of GPC results to those of 360s in JSC control center • Original divide design too slow for Shuttle use • Designed, in conservative TTL circuit technology, to be interruptible after development of each quotient bit • Last-minute redesign (AP-101B) solved the problem • Much later “improved” AP-101S divide not well verified! • http://klabs.org/richcontent/software_content/hal_s/hal-s_compiler_system_specification.pdf • 3-16: “DED and DEDR instructions are broken on the AP-101S” • 6-12: “I2DEDR was substituted for DEDR in DMOD in order to avoid incorrect results caused by some inputs. See CR11164 and DR106660.” Hugh Blair-Smith, NASA Office of Logic Design
IBM AP-101 Long Divide Cont’d • DED: Double Exponential (floating) Divide • DEDR: same but gets divisor from register • Both work for most inputs, but … • “Difficult to define” which inputs don’t work(!!) • However, OK if low word of divisor = 0 (D’oh!) • DMOD (remainder “modulo” function): • Is only user of these instructions, per audit • All uses OK, per the above “however” rule • I2DEDR substituted anyway, just in case Hugh Blair-Smith, NASA Office of Logic Design
IBM AP-101 Long Divide Cont’d • More general remedies: • Modify compiler to avoid DED and DEDR • Document problem in Principles of Operation • Conclusion with Nasty Suspicion: • Did “Process” fail to operate at proper time? • Developers may have found and worked around the problem without generating DR • How can all that vector/matrix code never divide?? • DR, audit, etc. may have been “after the fact” Hugh Blair-Smith, NASA Office of Logic Design
Intel Pentium Long Divide • Moore’s Law works better for component density than for processing speed • Complex special-casing, with table look-ups, for certain ranges of input values • Intel failed to proof-read a table in a PLA! • Verification of combinations of high-precision numbers cannot be exhaustive • Even if Intel could have tested one combination of input values every microsecond, the exhaustive test would take O(10^30) years (cf. age of universe = O(10^10) years) Hugh Blair-Smith, NASA Office of Logic Design
1802 Microprocessor (1986) • Not a logic design problem • High byte of a register sometimes writes over high byte of Program Counter • Dependent on electrical design factors • Voltage and temperature toleration • Length of polysilicon lines • Presence of many ones in other registers • A program like Smalley3 could have exposed it (looping through voltage & temp ranges) Hugh Blair-Smith, NASA Office of Logic Design
The 80k85 Verification Challenge • Two words: “Rigorous” and “Thorough” • Exhaustive inputs test almost possible • 8085/80k85 word length is only 8 bits, but: • The 16-bit precision inputs to instruction DAD (Double-precision register add) would take days to execute an exhaustive test • That’s 2^32 combinations, O(10^10) • So why not “suck it up” and spend the days? • A third word: “Looping” (for margin testing) Hugh Blair-Smith, NASA Office of Logic Design
A Historical Parallel from Apollo • The Block I and Block II Apollo Guidance Computers (AGC) each needed one self-test program for two purposes • Enhancement of manual design verification • Assurance that all features are still working • Ed Smalley of MIT Instrumentation Lab wrote those two programs • Some feedback to design: inclusion of an instruction to perform interrupt (EDRUPT) Hugh Blair-Smith, NASA Office of Logic Design
Exploiting the Parallel Further • Like the AGC models when Ed Smalley began his two tasks, the 80k85 was not quite a “newborn” when I began mine • Both machines had a considerable track record of executing a few programs correctly • All we needed was “rigorous” and “thorough” • In Ed’s honor, I named my 80k85 self-check program Smalley3 Hugh Blair-Smith, NASA Office of Logic Design
Overview of 80k85 Architecture • Addressing by byte (65,536 bytes of RAM) • Central registers and register pairs: • Accumulator A • 4 general registers B,C,D,E, sometimes as 2 pairs • 2 indirect addressing (or general) registers H,L • Program Counter PC and Stack Pointer SP (pairs) • Special: condition flags; interrupt mask • Accumulator and flags sometimes function as a register pair called Program Status Word (PSW) • 256 one-byte I/O ports: 128 input and 128 output Hugh Blair-Smith, NASA Office of Logic Design
Overview of 80k85 Instructions • First (often only) byte divided by Huffman coding into as many as 3 fields • Extreme case: MOV with 2-bit op, 3-bit destination tag, and 3-bit source tag—56 nontrivial functionalities • Ignoring those subdivisions, 245 valid ops • All the valid 8085 ops except DAA (Decimal Adjust) • Interrupt masking feature of 8085 omitted • 70 distinct instructions functionally • Four interrupts • All but one of the 8085 interrupts are implemented • All interrupts have the same priority (unlike 8085) Hugh Blair-Smith, NASA Office of Logic Design
Phased Development Plan • Objective: capability to test some ops before complete test is ready • Generally, early releases of Smalley3 tested simpler instructions • Later ones: more complex or involving parts of 80k85 design still subject to change, especially I/O ports • However, no rule that each instruction has to be tested using only simpler ones • Couldn’t achieve that rigorously anyway • Final phase: general RAM corruption detector Hugh Blair-Smith, NASA Office of Logic Design
Functional Groups of Instructions • NOP and single-byte transfers • Double-byte transfers • Single-byte arithmetic binary operations • Double-byte arithmetic binary operations • Single-byte Boolean binary operations • Assorted unary operations • Transfers of control (except HLT) • Stack operations • Data input & output operations • Interrupt management and illegal op codes Hugh Blair-Smith, NASA Office of Logic Design
Top-Level Design of Smalley3 • Perform entire test “cycle” just once, or: • Stated number of times • Indefinitely (until failure or manually stopped) • A Test Cycle is any subset of the 10 functional groups • In each functional group, test any subset of its distinct ops • For each distinct op, test any subset of its “parametric variations” (defined by all 8 bits) • For each variation, test against 16 systematic data value sets and from 1 to 239 pseudo-random data sets • Run RAM corruption check at any of the above levels • … or just once, at end of run whether good or bad • Any failure stops the test and supports manual analysis • Random data is not the same for successive test cycles Hugh Blair-Smith, NASA Office of Logic Design
Top-Level Design of Smalley3 (cont’d) • 128 bytes of input data placed in input ports by external test equipment at (fairly) regular intervals • But not predictable from “inside” the 80k85 • Each input port can be read in a (gulp) partially updated state • The four types of interrupt are commanded in turn by external test equipment at (fairly) regular intervals • But at truly random times as seen by Smalley3 Hugh Blair-Smith, NASA Office of Logic Design
Systematic or Random Data Environment for Instructions • Current machine-state data set placed in all central & special registers (except PC), by pairs • Value for Stack Pointer restricted so as not to step on Smalley3’s code or scratch registers • Machine-state data also used for an address of a pair of bytes of RAM, and for contents thereof • Address value restricted to not step on stack, or on Smalley3’s code or scratch registers • Insofar as instructions refer to 1 or 2 bytes of RAM, they use these bytes and contents • Similarly for address of an I/O port and contents Hugh Blair-Smith, NASA Office of Logic Design
Systematic Data Sets • 16 zeros to fill register pairs with all zeros • 16 ones to fill register pairs with all ones • 8 zeros and 8 ones for each register pair • 4 zeros, 4 ones, 4 zeros, 4 ones similarly • Alternating pairs of zeros and ones ditto • Alternating zero and one bits similarly • All these are mixed and matched to make 16 systematic data sets (distinct “interesting & edgy” combinations of 16 bits each) • Each systematic data set is placed in all register pairs Hugh Blair-Smith, NASA Office of Logic Design
Pseudo-Random Data Sets • The pseudo-random number generator (PRNG) is an implementation in 80k85 code of an 8-bit linear feedback shift register (LFSR) • Special-case logic added to “avoid the lockup state” so that the PRNG cycles indefinitely through all 256 states of a byte in a non-trivial sequence • The same “Content Engine” routine that deals out the systematic data sets has a mode that uses the PRNG twice to deal out 16 bits of pseudo-random data • Unlike the systematic mode, each register pair set up gets a different data set of pseudo-random data Hugh Blair-Smith, NASA Office of Logic Design
Instruction Test Pass/Fail Criteria • What does each instruction affect? • A small subset of registers and RAM • Mustn’t have any side effects • How to test both of these simultaneously? • Initialize, predict, and verify “machine state” • All central and special registers • Two bytes of RAM selected by data set • Same as top 2 bytes of stack wherever appropriate • One byte in I/O port selected by data set • Admittedly not the complete machine state Hugh Blair-Smith, NASA Office of Logic Design
Initialize, Predict, and Verify • Scoping/initializing limited “machine state” • Use current data set as required to establish “PRE” and default (i.e., matching) “POST” state values • Predicting changes in machine state • Every parametric variation of every instruction has its own predictor routine to establish whatever different “POST” state values are required • Verifying both changes and non-changes • All “FOUND” state values seen after execution must match corresponding predicted “POST” state values Hugh Blair-Smith, NASA Office of Logic Design
Principles of Prediction • As far as possible, make predictions for each functionally distinct instruction without using that instruction type • Table look-ups contribute to this solution • Addition/subtraction prediction (“Blackadder”) uses 256-byte tables aligned with addresses so that entering tables is done by setting L register only—no address addition involved • Boolean ops prediction uses loop to test each bit position in turn—no Boolean ops involved Hugh Blair-Smith, NASA Office of Logic Design
Verification and Analysis Support • Objective is to save in RAM everything needed to identify the exact fault detected • PRE, POST, and FOUND values reside in organized patterns of RAM locations • “BADS” value for each state variable is XOR of FOUND and POST values, to highlight wrong bits • Tree of BADS values provides overall Go/No-Go state in one byte; some bits are tree branches “pointing” to other BADS • Root FBADS has a bit “Regs” meaning examine RBADS, each of whose bits shows which register’s BADS to examine • Next slide is a map of analysis support locations Hugh Blair-Smith, NASA Office of Logic Design
Special Note on “Testing” HLT • Regular testing of HLT impossible in a self-check program that runs indefinitely • But some of the operational modes do end • Random machine state, PRE and POST, is set up for these necessary HLTs • Test engineer can manually obtain actual final machine state and compare against POSTs • By varying initial random seed or other run parameters, test engineer can exercise different machine states for HLT Hugh Blair-Smith, NASA Office of Logic Design
“Rigorous & Thorough” for I/O • Instruction testing doesn’t do much for I/O • Input ports can be written only by test equipment, read only by Smalley3 • Output ports can be written and read by Smalley3, and can be read by test equipment • But test equipment doesn’t read it critically • An approach used in some projects is for test equipment to wrap output ports back to input ports for self-check code to inspect • LOLA test equipment can’t do that directly Hugh Blair-Smith, NASA Office of Logic Design
“Rigorous & Thorough” for Input • LOLA test equipment generates systematic and pseudo-random input data sets for input ports • When limited number of test cycles are run, only the pseudo-random input data sets are used • When multiple complete test cycles are run, systematic data is used, then pseudo-random • Systematic data is all-zeros and all-ones at present, but that leaves a coverage gap • 3 more data sets with mixtures of zeros and ones could detect any case of 2 bit positions in the data being wired into reversed bit positions of an input port • Whether to add these is a work in progress Hugh Blair-Smith, NASA Office of Logic Design
Rigorous Input Testing Cont’d • Checking systematic input data is simple • Smalley3 knows the correct values a priori • Checking random input data is trickier • We impose a parity rule on each byte, and require a longitudinal XOR checksum • Honeywell 800/1800 (1960’s) did this with tape, achieving SEC-DED (“Orthotronic Control®”) • Parity rule is reversed between random data sets • Smalley3 identifies the bad port (if only one port is bad), and the bad bit within it (if only one bit is bad) Hugh Blair-Smith, NASA Office of Logic Design
Rigorous Input Testing Cont’d • How does Smalley3 know when an input data set is completely resident in the input ports? • Checksum logic arbitrarily rules out checksum = 0 • Zero in the checksum port is a signal that the 128 input ports, as a class, are not in a stable state • Transition from zero to non-zero in the checksum port is a signal that the new data set is stable in all ports • (except in checksum port, but that settles long before use) • How does Smalley3 deal with possibility that a port can be read while partly updated? • A “bad” checksum is read again to see if it settles out to zero and should therefore be ignored Hugh Blair-Smith, NASA Office of Logic Design
“Rigorous & Thorough” for Output • Smalley3 can verify that what it reads from an output port is what it just wrote there… • But can’t tell if wiring from the output port to the outside world is correct • Another work in progress: my proposal to do the I/O wrapback backwards(!) • Smalley3 would copy its verified input data to the corresponding output ports • Test equipment can remember what it put in the input ports, and could compare that to what it later reads from the output ports • On that point, the test equipment decides pass/fail Hugh Blair-Smith, NASA Office of Logic Design
Testing of External Interrupts • Test equipment commands interrupts in a regular pattern, but their arrival looks truly random to Smalley3 • Primary objective: verify that progression of machine states commanded by Smalley3 is not corrupted by interrupt • Secondary objective: verify that each interrupt used its correct target location and saved PC correctly • In this architecture, interrupt is functionally the same as CALL—resume address is in the stack • Can’t use PRE/POST/FOUND paradigm for interrupts Hugh Blair-Smith, NASA Office of Logic Design
RAM Corruption Detector • Principle is X-Y arrays of XOR checksums • 65k is conceived as 256 columns of 256 rows • 256 column sums of rows 0-253 form row 254 • 256 row sums form row 255 • Coverage of these checksums is all of RAM! • System and Smalley3 scratch registers • Smalley3 executable code • Leftover general-purpose RAM • Identifies any single bad byte, and the bad bit if there is only one • Also identifies bad row or column in some multi-byte errors Hugh Blair-Smith, NASA Office of Logic Design
RAM Corruption Detector Design • Routine to check checksums uses only central registers, no RAM, for itself • Separate indirect-address registers H & L are crucial on this point • Time consumption restricts construction of checksums to just once (beginning of run) • All Smalley3’s scratch locations are quadruply allocated, 1 prime and 3 shadow locations • 1 shadow in same row, 1 shadow in same column, and 1 at intersection of those 2 shadows • When shadows are up to date, values don’t affect checksums Hugh Blair-Smith, NASA Office of Logic Design
A True Confession • The considerable effort to include all those shadowed variables may not have been worth while • Since the shadowing prevents those variables from affecting the checksum, coverage is only marginally better than if checksums excluded Smalley3 variables • Still, the prologue to checksum checking does get some coverage from verifying that all 3 shadows are equal before updating them Hugh Blair-Smith, NASA Office of Logic Design
Smalley3’s Achievements • Caught a bug in implementation of one op • CMP B (compare Accumulator vs. B register) • Identified a weak point elsewhere in chip • Fan-out excesses caused low-voltage tests to affect one particular 80k85 instruction • Smalley3’s alarm induced design engineer to scan non-80k85 parts of chip for the problem • Solution to that greatly increased undervolt toleration • Motivated upgrade of CPU-memory slew rate (speed) for additional margin during high temperature operation Hugh Blair-Smith, NASA Office of Logic Design
Summary • Smalley3 occupies ~ 9.3 kbytes, of which: • 5.8 kbytes are executable code • 2.7 kbytes are tables • 1.1 kbytes are variables • Some executable code gets overlaid with variables • Smalley3 Test Cycle takes 14 sec if no RAM corruption checking (assuming 4 MIPS) • If RAM checking done at max frequency: 1.8 hour • Remember, the policy is to run many test cycles Hugh Blair-Smith, NASA Office of Logic Design
Conclusions • Rigorous & thorough testing of a small 8-bit microprocessor with no complicated instructions wasn’t that all-fired simple • See later slide on Scalability to 16, 32, or 64 bits • Testing of Smalley3 itself was sort of easy • Simulator allowed it to be developed on a PC • Smalley3 bugs looked like 80k85 faults: convenient! • But fidelity of an 8085 simulator to the 80k85 was less than complete • Also, a full-bore run overwhelms simulator capacity Hugh Blair-Smith, NASA Office of Logic Design
Conclusions Continued • Design of 80k85 and its test equipment could have reduced some complexities • Allow normal output-to-input wrapback • Eliminate possibility of reading half-baked port • Give self-check program more control over when inputs occur • But … would that be less realistic? • Greater fidelity of 80k85 instructions to 8085 where it doesn’t hurt 80k85 function Hugh Blair-Smith, NASA Office of Logic Design
Smalley3 is Valuable • Smalley3 will occupy one of four pages of EEPROM • Enables in-flight testing of processor • Flight version will include more systematic RAM testing • Rapid bit reversals may detect “fatigue” modes • Pattern-Sensitive Fault testing “by the book” • Smalley3 is going to the Moon! Hugh Blair-Smith, NASA Office of Logic Design
Scalability Considerations • How would 16-bit, 32-bit, and 64-bit processors be more of a challenge? • More, and more complex, instructions • Multiply, divide, floating point, trig, decimal, etc. • Base and index registers add much complexity • Vastly greater amounts of RAM to check • Even so, table look-up methods get impractical • OK, how might the challenge be eased? • Larger processors in hi-rel applications might have some fancier hardware Built-In Self-Test • Around 1970, we at MIT designed a SIRU (strapdown inertial reference unit) controller whose instructions calculated & compared direct and complement results Hugh Blair-Smith, NASA Office of Logic Design
A Minority Opinion (Just My Own) on One Design Point • Allowing interrupts to do everything they do in large multi-tasking systems adds complexity to both mission and test code • Embedded control processors don’t need that much flexibility • They can sample inputs periodically to obtain the same information that interrupts provide • Ideally, that would use a program-readable clock • It’s not as if any of the code is raw product from a raw and loop-vulnerable beginner • Space Shuttle GPC synchronization took out most of the flexibility even at that level • Response to interrupts restricted to programmed sync points Hugh Blair-Smith, NASA Office of Logic Design
QUESTIONS? • That’s why we reserved the room ’til 1 PM • Of course, there is the trade-off vs. lunch! Hugh Blair-Smith, NASA Office of Logic Design