or How Hardware Helps Us to Build Safer Solutions
Safety design and automotive applications are my dada. And I'm a fanboy of the Texas Instruments Hercules microcontroller family.
You can't expect an unbiased, neutral review here. I'll get carried away.
Why Safety Designs?
There are situations where safety is key. Whenever a human live is in danger (think: medical, transport, dangerous industry processes) we have to be careful with our design.
As device creators, we're expected to always develop things that work correctly. In some areas it's also important to recover when we can, and fail gracefully when we can't.
It's hard to develop programs that handle specified behavior correctly. It's umphteeen times harder to write code that handles unexpected situations.
People that develop Secure applications know how hard it is to predict how hackers will attack their application.
When developing Safety applications, it's the real world that will mess with your design.
And you can't afford that. It's bad when crooks run away with your credit card info. It's worse when the brakes stop working in your car.
Software can't handle everything
You can do a lot during firmware development.
Professionals that design for safety have good tools to write safe software.
The first tool is their brains and common sense.
When you are aware that your design is going to be used in a situation where people's integrity is at risk, you, as a responsible engineer, do all you can to design a sound , sturdy and safe application.
Few mishaps have happened because the designers were lazy or dumb. But the brain is sensitive to stress and pressure. And to distraction. And we can't predict all the things our code will be exposed to.
If that wasn't the case, there wouldn't be so many security breaches and system related failures. Most of them were exposed when the system was used in an unplanned way.
An honest designer knows that unpredicted things can happen. None of us have ever delivered a significant design that handles all predictable situations correctly, have we?
Our brain also messes with ourselves. See reviews, certification and audit below.
Our next toolkit is static and dynamic code analysis and test tools.
There are initiatives like MISRA.
These are in essence libraries of guidelines and rules. They help you to avoid errors that have been made in the past.
The automotive industry has analyzed mishaps that were caused by program bugs, and distilled patterns out of that analysis.
These patterns have been turned into verifiable software rules.
MISRA lists the settled practices that have resulted in mishaps in the past. MISRA tools analyze your code and flag constructs where those rules are violated.
Skilled developers think they know better than that - and they often know exactly how those constructs work. Self esteem on technical skills is an attribute of many an engineer.
Still, several fatal accidents can be traced back to bugs related to those practices.
In C, you have dynamic memory allocation and UNIONs. We all know how they work (and if we don't know, we use Google, don't we?).
I'm a fan of both constructs. On the other hand, these basic language concepts are the cause of several flukes.
If you look at both mechanism, you can see how they can turn bad very easily.
Many of us have made mistakes when allocating and releasing memory - or walked past the end of allocated memory. Especially when unexpected failures and exceptions happen.
Cue the Blue Screen of Death.
Similar with UNIONs. They are great. Especially when memory is scarce, they help us to limit the space used by structures in a structured (sic) way.
But if we are true to ourselves, wen can also see how easy it is to go astray with them.
We are drilling into the bit level here, and on Monday mornings some of us are a bit flaky on that part. So easy to read or write values at wrong locations. All of us (maybe you excluded) screwed up at least once.
MISRA has rules for these constructs that have lead to mishaps before. And MISRA code analysis tools can flag them for you. In Safety applications that is a Good Thing.
You have the choice to deactivate rules in MISRA, if you have a process to deal with those inactivated rules. You can adapt the rules to the maturity of your design team.
There are also LINT and other code analysis tools that add great value.
Loads of known code issues and suspicious design constructs can be found during static (source analysis) and dynamic (runtime analysis) examination of your code. There's always this learning ramp and acceptance bridge to cross. But these tools are not that hard to grasp.
And they have the added benefit that they force you to automate your build cycle. Yet another cause of misery solved.
Automated testing on hardware and software level is a third layer of security.
Isn't it great when you can run a testcase when you've made a change to the firmware. And isn't it great that you can plug your device into a test fixture to see if it still performs as expected?
Who wants to stand there red-cheeked in front of an audience because they have re-introduced an issue that was solved three iterations ago?
A hardware and software testbed that validates the defined functionality - and verifies any issue that has been fixed in the past - is a great safety net.
Reviews, certification and audits
We are blind for our own errors. Many bugs are introduced because of a quirk in our thought process. And that is difficult to detect because it lingers on inside our brains while debugging.
As exhibit A, I will bring up a post I've created on EEVBlog when validating the µCurrent.
Any engineer can see where I'm wrong. I'm formally trained in electronics. Still I got wrong on a fundamental (yes, Ohm's law) subject: http://www.eevblog.com/forum/testgear/current-in-na-range/
Reading back on the subject, I can't imagine how this happened to me. But it did. We have blind spots at some times, and we don't get out of circular wrong thinking without someone else putting us on the spot.
In this case, I forgot the existence of the whole milliamp range. Embarrassing, yes. But it happened. Feel free to humiliate me. I'm using my real name here .
That's a humbling experience that warns you that you can go wrong anytime. Even when you know the subject from when you were 14.
Reviews and external eyes are key. By default, we get defensive when someone reviews or criticizes our work.
We shouldn't. We should embrace peer reviews and design/code/process audits.
That is the moment when we learn things that can't be learned by courses or tutorials. That's when we learn from fellow experts (I also have career advice on how being wrong and dealing with it can help you. But that's out of scope ).
In the end, we are humble people that can - and will - fail.
When confronted with IEC 61508 or ISO 26262, don't fear it. Embrace it. Your product will benefit from it, and you'll become a better engineer.
No educational experience can top that. Check the job market.
Enough about Safety and Software for the moment. Let's move over to Safety and Hardware.
Next: How Hardware Can Help Us