Clustered MCUs

Enter Your Electronics & Design Project for Your Chance to Win a Grand Prize for the craziest project or a $100 Shopping Cart!

Back to The Project14 homepage

Project14 Home
Monthly Themes
Monthly Theme Poll

 

The Project14 theme for September '18 is Clustered MCUs. The theme page lists a number of multiple CPU configurations. What's not mentioned is lockstep CPUs, a design where multiple physical CPUs are used to check integrity of the processing.

Lockstep is a technique to validate the process integrity in hardware, with minimal performance and energy consumption impact. It will detect inconsistencies generated by external and internal events. There's no practical software solution that can offer the same level of verification.

I'm using a TI Hercules Safety Microcontroller to show how this works.

 

 

Lockstep

 

In a lockstep configurations, two identical cores are placed on the same silicon.

One is the main processor. That's the one we use the results from. The other one is used to verify the main one's correctness.

Both of them execute the same instructions, but not at the same time. The second processor (the checker) executes everything with a few clock cycles delay.

Upon each clock tick, the result of the previous activity is compared between processors. If no external events have made the controller glitch, the results have to be the same.

If not, an alert is raised.

 

Because external events (a peak in power, magnetic or electric field interference, what have you) can impact both cores, there are some physical measures to minimise them (avoid common mode errors).

  • Running 1.5 - 2 clock cycle apart takes care that the same event doesn't hit both cores while they are doing exactly the same instruction.
  • Flipping one upside-down in relation to the other takes care that the same impuls doesn't hit the same spot if it travels from bottom to top or vise versa..
  • Rotating one 90° in relation to the other takes care that an external event doesn't go through the code in the same direction when it's an impact from the side.

 

source: Texas Instrument paper: Introduction to Hercules™ ARM® CortexTM-R4F MCUs

 

From the whitepaper Hercules™ Microcontrollers: Real-time MCUs for safety-critical products:

 

The lockstep CPU scheme implements a checker CPU, which is hardwired to be fed the same input as the functional CPU.
Two blocks of the same logic, fed the same input, should in theory produce the same output.
A core compare module monitors the outputs of the two CPU cores on a cycle-by-cycle basis and signals any errors to the system.
This near instant fault detection in the Hercules MCU’s comes with little penalty in power consumption and no impact to CPU performance.

Also, in comparison to other elements on the MCU, the size overhead of the lockstep mechanism is minimal.

 

Whenever logic is duplicated, there is always a concern of common mode failure.
To combat common mode failure, TI has implemented multiple best practices on the lockstep CPU subsystem.
Temporal diversity of the two CPU cores is implemented, such that the CPU cores operate 1.5 or 2 cycles out of phase in order to mitigate common mode failure in clocking.
A voltage guard ring is implemented around the CPU cores.
Physical design diversity is implemented by flipping and rotating the checker CPU with respect to the functional CPU.

 

LockStep Test Project Source and Details

 

The project is described in detail - with full source attached - in element14 blog Hercules Safety MCU Demo with Educational BoosterPack.

It exercises the lockstep cores in two ways.

  • It performs a full self-test at the start of the program.
  • It injects a deliberate error in one controller to validate the detection and alerting system.

 

Self-test example

 

The self test (in Hercules lingo LBIST: CPU Logic Built in Self Test) runs a set of tests on both CPUs at the startup of the device.

These tests are comparable to the tests that are run on each Hercules MCU at production time.

source: Texas Instrument paper: Introduction to Hercules™ ARM® CortexTM-R4F MCUs

 

 

Error Injection example

 

This test deliberately forces a core compare error situation (testing error scenarios is an important part of safety validation).

It's not easy to create a core error in one of the CPUs. Lucky for us, the manufacturer has provided on-silicon functionality to do that.

When you set one of the MCU's registers to a particular value, the core compare error is triggered. The detection trap should activate and we can test our handling logic.

 

 

As written above, the source code for this project is available on blog Hercules Safety MCU Demo with Educational BoosterPack.

The full Code Composer Studio project can also be downloaded from there.

 

Action Video

 

The video shows the functional safety features in action. The lockstep CPU integrity is at the beginning of the demo.

 

 

This is not the typical Project14 content, but I hope this little explanation and demo explains the concept of this niche Clustered MCU architecture.