170 likes | 333 Views
Reliability. Threads for Fault Tolerance. Multiprocessors: Transient fault detection. Transient Faults. Faults that persist for a “short” duration Cause: cosmic rays, energetic particles originating from outer space Effect: knock off electrons, discharge capacitor Solution
E N D
Threads for Fault Tolerance • Multiprocessors: • Transient fault detection
Transient Faults • Faults that persist for a “short” duration • Cause: cosmic rays, energetic particles originating from outer space • Effect: knock off electrons, discharge capacitor • Solution • no practical absorbent for cosmic rays • 1 fault per 1000 computers per year (estimated fault rate) • Future is worse • smaller feature size, higher transistor count, reduced noise margin
Background • Fault tolerant systems use redundancy to improve reliability: • Time redundancy: separate executions • Space redundancy: separate physical copies of resources • DMR/TMR • Data redundancy • ECC: Automatic repeat request (ARQ) , Forward error correction (FEC) • Parity: odd/even • Examples: • IBM: duplicated pipelines, spare processors, ECC in memories... • HP: DMR/TMR processors, Parity/ECC in buses, memories...
Multiprocessors: Fault Detection • Chip-level Redundantly Threaded processor • Replicates register values but not memory values • The leading thread commits stores only after checking • Memory is guaranteed to be correct • Other instructions commit without checking • The leading thread sends committed values for: • branch outcomes • load/store values • store addresses
Sphere of Replication (SoR) • Logical boundary of redundant execution within a system • Components within protected via redundant execution • Components outside must be protected via other means • Its size matters: • Error detection latency • Stored-state size
Example Spheres of Replication ORH-Dual: On-Chip Replicated Hardware (similar to IBM G5) Compaq Himalaya
Fault Detection in Compaq Himalaya System Replicated Microprocessors + Cycle-by-Cycle Lockstepping
Fault Detection via Simultaneous Multithreading (SMT) Replicated Microprocessors + Cycle-by-Cycle Lockstepping
Concept • SMT improves the performance of a processor by: • allowing independent threads to execute simultaneously • doing so in different functional units • Redundant Multithreading (RMT): • leverages SMT’s properties to allow fault detection for microprocessors • runs two copies of the same program as independent threads • compares their outputs and initiates recovery in case of mismatch
Input Replication • Load Value Queue (LVQ) • Keep threads on same path despite I/O or MP writes • Out-of-order load issue possible
Output Comparison Compare & validate output before sending it outside the SoR
Store Queue Comparator (STQ) • Store Queue Comparator • Compares outputs to data cache • Catch faults before propagating to rest of system
Store Queue Comparator (cont’d) • Extends residence time of leading-thread stores • Size constrained by cycle time goal • Base CPU statically partitions single queue among threads • Potential solution: per-thread store queues • Deadlock if matching trailing store cannot commit • Several small but crucial changes to avoid this
Branch Outcome Queue (BOQ) • Branch Outcome Queue • Forward leading-thread branch targets to trailing fetch • 100% prediction accuracy in absence of faults
Simultaneous & Redundantly Threaded Processor (SRT) • SRT = SMT + Fault Detection • Less hardware compared to replicated microprocessors • SMT needs ~5% more hardware over uniprocessor • SRT adds very little hardware overhead to existing SMT • Better performance than complete replication • better use of resources • Lower cost
Issues • Cycle-by-cycle output comparison and input replication: • Equivalent insts from different threads may execute in different cycles • Equivalent insts from different threads might execute in different order • Precise scheduling of the threads crucial for optimal performance • Branch misprediction • Cache miss