# Radiation Effects in Digital Data Processors

Prof. Bruce Jacob
Electrical & Computer Engineering
University of Maryland

# Main Points

- By Design, Digital Microprocessors Have Many Points of Failure
- Existing Error-Detection/Correction
   Techniques Target Implementation Errors
- Goals of the Research
  - Characterize Failure Behavior
  - Investigate Solutions (Heterogeneous Designs?)

# A Typical Microprocessor



- Clock Nets
- Power Nets
- Busses
- Datapath
- Control
- Memories

# **CPU Components**

- Clock & Power Nets
  - Clock: Typically H-trees
  - Power/Ground: Less Regular
  - Make fantastic antennas
  - Extremely sensitive to variations
  - All other subsystems dependent
- Busses
  - Groupings of parallel wires
  - Encoding schemes for errors in transmission but not endpoints



# **CPU Components**

## Datapath

- Where the data is processed (adders, multipliers, muxes)
- Regular device arrays
- Self-correcting logic schemes assume checker is correct

## Control

- Represents if-then logic
- Less regular than others
- Self-correcting as above



out, out, out, out, out, out, out, out,



Outputs

# **CPU Components**

## Memory

- Includes SRAM caches, DRAMs, register files, pipeline registers, etc.
- Very regular in structure
- Error-detecting/correcting codes target limited bit-flips in storage and/or transmission but assume correctness of checker
- Large storage off-chip, small storage on-chip



## SIMPLE TEST CPU

Observations made and measurements taken at critical points



### SIMPLE TEST CPU

Observations made and measurements taken at critical points



### SIMPLE TEST CPU

Observations made and measurements taken at critical points



#### SIMPLE TEST CPU Observations made and measurements taken at critical points **PC Update** Program Counter ADDR **INSTRUCTION MEMORY** Clock Net **MUX DATA PATH** INSTRUCTION Quantify reliability of arithmetic/logic unit TGT RDATA TGT, output vs. input **REGISTER** DATA SRC1 **FILE MEMORY** SRC2 OPCODE WDATA ADDR SRC1 SRC2 CONTROL **CONTROL LOGIC DATA BUSSES ALU Result Bus** Behavior of irregular Effectiveness of combinational logic error-detection and designs error-correction

#### SIMPLE TEST CPU Observations made and measurements taken at critical points **PC Update** MEMORY Quantify susceptibility Program Counter to bit-flipping and write failure ADDR **INSTRUCTION MEMORY** Clock Net MUX **DATA PATH** INSTRUCTION Quantify reliability of arithmetic/logic unit TGT RDATA TGT . output vs. input **REGISTER** DATA SRC1 **FILE MEMORY** SRC2 OPCODE WDATA ADDR SRC1 SRC2 CONTROL ALU **CONTROL LOGIC DATA BUSSES ALU Result Bus** Behavior of irregular Effectiveness of combinational logic error-detection and designs error-correction

#### SIMPLE TEST CPU Observations made and measurements taken at critical points **PC Update** MEMORY Quantify susceptibility Program Counter to bit-flipping and write failure ADDR **INSTRUCTION MEMORY Clock Net** MUX **DATA PATH** INSTRUCTION Quantify reliability of arithmetic/logic unit TGT RDATA TGT . output vs. input **CLOCK/POWER REGISTER DATA** SRC1 **FILE MEMORY** H-Trees and similar SRC2 designs make very OPCODE WDATA ADDR SRC1 SRC2 good antennae CONTROL ALU **CONTROL LOGIC DATA BUSSES ALU Result Bus** Behavior of irregular Effectiveness of combinational logic error-detection and designs error-correction