Safety plays an increasing role in applications ranging from avionics and automotive to medical and industrial. By designing in redundancy, diversity, self-monitoring, and diagnostic capabilities, engineers can develop systems-level safety features that will ensure an infusion pump doesn't shock the user, for example.
On the factory floor, safety has become increasingly sophisticated, implementing not just Save Torque Off but more subtle modes like Safe Indexing or Save Limited Speed that allow machine operators to clear jams without risking injury. A safety implementation is only as good as its foundation, though. If the microcontroller unit (MCU) inside the safety drive or safety programmable logic controller isn’t trustworthy, neither is the rest of the system. An ideal MCU provides both redundancy and the ability to monitor itself to ensure health.
The process essentially involves building up safety from the inside out. Dev Pradhan, product line manager at Texas Instruments in Dallas, calls it the safe island approach -- develop a robust part of the controller module that contains the most essential capabilities and diagnostics, and then use that safe island to ensure the rest of the device, and then the system, operates correctly.
To guard against common-mode failure, the two processors are oriented at 90° to one another, and a two-cycle delay is introduced in the signals before comparison. (Image courtesy of Texas Instruments.)
That’s the thinking behind TI’s Hercules line of safety MCUs. Based on a homogeneous dual-core architecture, the Hercules platform features comprehensive diagnostic and self-test capabilities built into the hardware, increasing speed and robustness while simplifying the software implementation.
The latest addition to the Hercules family, the RM4X, is aimed at industrial and medical designers targeting IEC 61508 SIL-3 and ISO 26262 ASIL-D certification. With clock speeds of up to 220MHz, the module features dual ARM Cortex-R4F floating-point cores operating in lock step for cycle-by-cycle error checking. The implementation is designed to minimize common-mode failure by orienting the chips at 90° to each other and by introducing a delay before evaluation. (See the figure above.) Both flash memory and RAM include error correction code (ECC) with double-bit error detection and single-bit error correction. Because the ECC resides in the CPU, it can monitor data for corruption as it transfers between memory and CPU without introducing latency.
The biggest challenge in reliability is planning for the unexpected. The system needs to be able to detect systematic faults and random events like data corruption or electrostatic discharge and notify the rest of the system. Shared-channel dual ADCs, for example, guard against data corruption. A memory-protection unit ensures that every bus master in the device has allocated memory, to ensure that one peripheral or bus master does not corrupt the data of another.
In addition to basic diagnostics, the Hercules chips run a series of self-tests for CPU and SRAM, based on built-in logic. One of the advantages of conducting self-tests in the hardware is that the process can be interleaved with the control loop to perform tests in real-time. Given that full diagnostic sequences can last up to several hundred microseconds, they can easily exceed the duration of a control loop in a high-speed packaging line, for example. Distributing the self-test process allows periodic diagnostic routines to run even within control loops.
The MCUs boast systems-monitoring capabilities, as well. Protection circuitry triggers when voltage moves outside a set level, for example. Interior and exterior clocks constantly evaluate skew to make sure it does not exceed the maximum allowable level.