Managing Faults: OneSpin’s Dave Kelf talks ISO 26262
August 22nd, 2017 by Peggy Aycinena
In a recent conversation with OneSpin’s Dave Kelf, he laughed when I asked him to characterize the complexities of meeting functional safety standards when developing automotive electronics. “It’s a whole rat’s nest of certification,” he said, “and as an industry we’re not there yet.
“However, at OneSpin we have a good handle now on what you need to do to make these cars safe. We’ve been working for quite a while with Bosch, Infineon, and other companies that really have a good idea of what needs to happen with the chips in cars to make them safe.
“In fact, a large part of the regulations come from these guys because they’re the experts, along with some level of government oversight, in trying to make sense of it all.”
I asked if the auto industry has taken any guidance from the decades of safety work coming out of aeronautics and defense.
“Yes,” Kelf said “the auto suppliers are definitely taking pieces of the solution from aero and defense. Although, it’s interesting because the auto market is changing so much faster than the aero industry, which has always been conservative in moving forward.
“And automotive is a very competitive, as you know. Electronics has become the way they compete in their market, rather than the mechanics as was the case in the past. The automotive companies are dying to introduce clever new, differentiating electronic features.
“But they can’t afford to take the multi-year process that the aero industry always needs. Nonetheless, the auto systems must be safe, which is why they created the ISO 26262 standard.”
“But,” Kelf cautioned, “ISO 26262 is complex. Yet the designers and tool makers must find their way through it to provide the systems people need, and to help make these regulations a reality.”
“Happily,” he added, “the industry is arriving at the point where we’ve got a handle on this.
“OneSpin and others in this space have figured out which regulations make sense. The big automotive semiconductor companies agree that the EDA vendors are doing the right thing with regards to the standard.”
“Is there any hope of sorting all of this out?” I asked.
“For sure we will get it sorted out, and pretty much have,” Kelf replied, “but we have many forces working at cross purposes – for example, new semiconductor technology, system and software complexity, and increasingly diverse requirements that add more constraints to the design and verification process.
Distinguishing between design flows
“Look,” Kelf continued, “ISO 26262 is concerned about both systematic design errors and random operational errors.
“The systematic flow is the basic design of the chip, the process of applying the verification techniques that we already know and love and work with successfully, but in a much more rigorous fashion.
“The requirements spec must be mapped to the verification plan, which stipulates that individual requirements much have their own verification tests and regulation metrics. That way you can check that every requirement is being met and verified to work.
“This is all part of the verification process we’re familiar with, but at a much higher quality level. And, as you can imagine, formal fit into this scenario perfectly. Ever since OneSpin was launched out of Infineon, we have known these things.”
“And the random design flow?” I asked.
“The random side of the puzzle,” Kelf replied, “is what most people think safety-critical design implications are all about.
“Every semiconductor device will be interrupted by a random event at some point – radiation from the sun, EM interference, and so on – that might cause a bit to flip in some part of the design, including the memories, and this can cause a random error.
“But for an automotive company to use your chip, you’ve got to prove that the device is completely fail-safe, that no random fault can successfully interrupt correct operation.”
“To do that,” I noted, “the aero industry has depended on redundancy.”
“Yeah,” Kelf responded, “in an aircraft you have physical space and available power to operate redundant systems that can switch in, if the main system shows an error. If one system fails, the overall control center sends an alarm, causing a switch.
“But that [strategy] is very expensive in terms of area and performance at the chip level, making it hard to implement for automotive systems. However, the liability around these chips is so great, those expenses may be necessary for critical components.
“To be honest, the ISO 26262 spec is about having a standard that you, as a chip provider, can point to and say: ‘Look, this is the state of the art in safety for chips, and we’ve met the spec and done everything we can, so we are not liable for a problem.’
“Of course, there’s more to it than just meeting the spec. The real question for the chip designer is how to create the most efficient chip, which is particularly important in the control systems for cars that will depend on these devices.”
“For instance,” Kelf continued, “take a memory. Now add a few extra cells into that memory, such that if a bit is flipped, there is enough information to decode the original value upon a data read. This strategy allows you to do a full correction, at least for single bit flips, without having to implement full redundancy.
“How does that work?” I asked.
“The common mechanism is to use Hamming codes,” Kelf said. “An 8-bit state value, for instance, may be encoded into a 10-bit number that gets stored. When the system decodes the value, it takes the 10-bit number and converts it back into the original 8-bit state.
“This way, if a fault does occur, it will be spotted and the control system generates an alarm signal, even for multiple-bit flips. For a single bit flip, the Hamming code may be decoded back to the original, correct number.
“If, however, the controller detects a 2-bit error, it can still generate an alarm signal but it cannot recover the number and the system has to operate with the error. Happily, it’s an extremely remote occurrence to get a 2-bit error.”
Kelf was reassuring: “You and your car wouldn’t even know if a 1-bit error occurred.
“If you drive, for instance, from Munich to Hamburg in a typical manner, the system might receive anywhere from 1 to 10 alarms, all of which will be self-corrected. The companies who make these systems have been at this for so long, they really understand how to handle these things.
“It’s true, with the advent of finFET devices and other semiconductor technology, the system is more likely to propagate errors, and they are that much more difficult to deal with. Nonetheless, the technology is getting better and better, and is becoming far less error prone.”
Adding to his message of reassurance, Kelf said, “With the formal tools from OneSpin, designers can access 7 or 8 different [verification] mechanisms, everything from systematic flows, to tools for fault injection and diagnostic coverage.”
Our conversation ended with a discussion of errors that are introduced in manufacturing.
Per Kelf, “In manufacturing, of course, fault analysis is very important, and a lot of these techniques came from Design for Test [DfT] solutions.
“With formal verification, we can actually eliminate the need to create a lot of stimulus, and can speed up the fault-simulation process by eliminating, or pruning out, those faults that don’t propagate.
“In adding fault detection and analysis, we are combining features to provide a fault elimination flow. It’s quite a dramatic development, and one that our customers see as speeding up the verification process and improving the outcome.”
“We all want to drive cars that are safe and reliable,” he concluded. “I say that not just as someone associated with OneSpin, but as someone who also wants my car to be safe and reliable.
“We’ve all got a lot invested in making ISO 26262 a success, no matter how complicated it is. Our work is important – both for you and for me.”
Tags: Bosch, Dave Kelf, formal verification, Hamming code, Infineon, ISO 26262, OneSpin