Mention machine safety and most people think of shatterproof
goggles and other protective gear. They might also think about physical safeguards
and emergency switches on machines, but are unlikely to consider the safety
aspects of the software that controls those machines.
When engineers think about the safety of software for industrial devices, automotive electronics, medical equipment and consumer devices, they should understand that their application code gets "blended" with the real-time operating system (RTOS) to create a single executable file. That combination occurs when any real-time operating system, not just ThreadX, becomes part of an embedded system. (Some RTOSs also support separate, dynamically loaded programs that get brought into memory on demand.)
In an embedded device, the RTOS might occupy between 10 and a few hundred kilobytes of code. Application code, on the other hand, might consume over a megabyte. So just employing a rock-solid operating system isn't sufficient to guarantee safe operation of the software within a device. To properly address software safety, developers need a set of tools that permits them to investigate, debug, and test the entire system, which includes the RTOS.
In an RTOS, Experience Counts
Engineers have used operating systems such as ThreadX in hundreds of thousands of varied products, so RTOSs have experienced about every imaginable circumstance; the types of things you can predict and test for, as well as those you can't. Thus designers should choose a fully tested and field-proven operating system that has seen a lot of applications in the real world. This is the best way to avoid those unanticipated surprises that can lead to system failure at just the wrong time.
Developers often ask, "What happens inside an RTOS to makes it rock solid?" One example of the internal checks that a good RTOS performs is consistency checking of function parameters. When application code calls a ThreadX RTOS service through the application programming interface, or API, we check the parameters sent to the RTOS to ensure consistency with the rest of the software. This step detects when programmers have referenced a nonexistent thread or specified a CPU time longer than that available. That type of API checking helps programmers catch errors before they get too far into their application code.
Developers have the flexibility to configure the RTOS to perform many such internal checks, and to produce a lot of information that's helpful in checking and debugging. Just before their code goes into "production," they might remove the debug code or they might decide to leave it in. Although the debug code in the RTOS requires some memory space, many developers leave it in their final code to facilitate field debugging or testing in the event of a problem with an application.
Don't Go with the Overflow
Developers also must guard against stack overflows that can corrupt memory and cause problems when the processor later uses the data at those corrupted memory addresses. Then, when the problem surfaces, its symptoms might look nothing like a stack overflow. Our StackX tool mathematically calculates the stack use in the executable code, prior to execution, and provides a large benefit in safety-related applications. Calculating a stack size works better than making a rough stack estimate, adding a bit more memory space to it, testing the application, and seeing if it crashes. StackX can analyze any executable in a form such as the executable and link format (ELF), an industry standard.
Many operating systems will detect a stack overflow and generate an error warning, but those steps take memory and processor time, and they occur after the damage has been done. Without StackX, the developer must make a tradeoff between turning on overflow detection and degrading performance a bit, or trusting that they've caught all of the demands for stack usage and feel safe about the specified stack size.
Dump and Analyze Trace Data
When developers test and debug their application code, they can use a tool such as TraceX that acts like a logic analyzer for the software. TraceX displays a horizontal time axis while the vertical axis shows all the system threads or tasks and indicates what they're doing at each time division. So, you can examine the events that lead up to a malfunction as well as what happens after it. When a system crashes or at any time during execution, you can dump the target system's "trace buffer" memory to the host and analyze it. You might find that Thread 3 never gave up the CPU, so Thread 1 never reset a watchdog timer and that caused the system to fail. You see a graphical flow of program execution, all application thread activity, the operating-system services used, measurements of the percentage of CPU time each thread used, and so on. But TraceX doesn't look inside the operating system. It simply tells you when your application called on the operating system and what it asked the RTOS to do. Those tasks include message-passing, synchronization, context switches, preemptions, suspensions, terminations and system interrupts.
A typical hardware logic analyzer has various trigger capabilities and you can use similar "trigger" conditions with TraceX to only trace events of a certain type or events for a specified thread. Thus, if you know what to look for, you can eliminate a lot of extraneous information.
In addition, you can log user events that don't relate to operating system services but occur at critical points in your code. When the CPU reaches such a point, it puts a log entry into the trace buffer. Then, when analyzing the uploaded trace buffer on the host with TraceX, you can search for user events along the time scale, and TraceX will take you right to "Event 5," for example.
Many of the types of tools and operations mentioned here are not unique to Express Logic. They represent a collection of software tools developers should know about when they test application code to ensure its safety. If similar tools don't come with the RTOS they use, they should try to find a way to implement them on their own.
Use Industry Standards and Guides
Businesses that produce software for medical, military or aerospace devices must comply with standards that specify measures of safety and how to validate the safety of system and application code. The US Food and Drug Administration, for example, publishes the document, "General Principles of Software Validation; Final Guidance for Industry and FDA Staff," (1-11-2002). Because the RTOS forms such a small part of the final application code, companies that produce the end product, not the RTOS vendors, generally perform the validation tests, using the full RTOS source code provided by the vendor.
The commercial-avionics world relies on the RTCA/DO-178B document, "Software Considerations in Airborne Systems and Equipment Certification." (RTCA stands for the Radio Technical Commission for Aeronautics.) Safety certification includes four levels, A, the highest, through D, the lowest. Again, certification and validation apply not to just the operating system or the application code, but to the final product that contains both.
Click here for information about FDA software validation
Click here for information about DO-178B.
John A. Carbone, vice president of marketing for Express Logic, has 35 years experience in real-time computer systems and software that include work as an embedded-system developer and field application engineer. Prior to joining Express Logic, Mr. Carbone was vice president of marketing for Green Hills Software. He has a B.S. degree in mathematics from Boston College.