Great point, William - error handling can make a huge difference in system operation. Sometimes it takes awhile for a specific error to show up and then error handling code is introduced after the fact...it can be hard to anticipate all of the failure modes that are possible and to have code written up front to handle all possible scenarios. Windows OSs are classic examples of this concept!
Thanks for elaborating, Jim. As a test engineer, I have often ran into what some people would call obvious failures only to find that the issues were much more subtle - the obvious failure was merely a symptom of a much more complex issue that could be related to either hardware OR software. That is the challenge of electronics - the obvious answer is not always the correct one.
Most systems that fail to allow for an exception will perform adequately, or even quite well, until that exception occurrs. Then there is a failure. If the system is robust enough there may be an automnatic recovery, otherwise a wander-off, or a crash. The crashnis what your system did, although it sounds like it was a "wander off then crash" mode. The challenge is, and has been, to handle the exceptions correctly.
The system design spec was good and in this respect if it had been met there would not have been a problem. The spec specified the digital data receiver inhibit the data input during the interrupt interval. The hardware implimentation somehow missed doing what was specified although I believe the designer thought he/she? had met the reguirement.
It was a hardware function that was not implimented correctly. I suspected the person who designed the circuit did the test verification that showed it worked correctly (:|) repeating a conceptual error. The system had been well tested in CA without many problems.
Nancy you made me think more about the problem. What's not said is that the data transmission often had errors caused by the lightning and CRC testing would catch them. Also I would guesstimate there could be over thousand hits a day ( a "single" bolt of lightning probably created multiple data hits). At 1ms per data packet there were almost 100 million packets/day so a 1000 packets a day being thrown out was not a flag of concern but an indication the system was working correctly.
With a 100 nS window of opportunity in a 1 ms time window that suggest probably only 1 out of 10000 hits could corrupt the CRC protection (note the lightning had to hit only the last 100 ns not before; if it hit before it would be detected and thrown out by the CRC detection). That in turn suggests that only once every 10 to 100 days there would be a crash. As I recall a three week interval between crashes was an interval was once spoken too. Also Florida was considered the lightning capital of the world (Congo beats them out) with Tampa recording 21,000 cloud-to-ground (Ju 93); cloud-to-cloud probably affected our system too. For a perspective a bolt of lightning can exceed 50 KA and have rates of change of 40 KA/s. The source voltage behind this gets very high.
I read somewhere that Florida is the most lightning-active area in the USA. I suppose one can get used to anything... And apparently the computer crash didn't happen with every thunder crash, so it's understandable why the software guys didn't catch it as being a hardware problem.
As amazing as it sounds that they couldn't see the connection between a thunderstorm and a system crash...those software guys simply could not think outside of the box. They were so busy defending the performance of the system that they couldn't see the obvious. Sometimes people get tunnel vision and need someone from the outside to point things out.
Truchard will be presented the award at the 2014 Golden Mousetrap Awards ceremony during the co-located events Pacific Design & Manufacturing, MD&M West, WestPack, PLASTEC West, Electronics West, ATX West, and AeroCon.
In a bid to boost the viability of lithium-based electric car batteries, a team at Lawrence Berkeley National Laboratory has developed a chemistry that could possibly double an EV’s driving range while cutting its battery cost in half.
For industrial control applications, or even a simple assembly line, that machine can go almost 24/7 without a break. But what happens when the task is a little more complex? That’s where the “smart” machine would come in. The smart machine is one that has some simple (or complex in some cases) processing capability to be able to adapt to changing conditions. Such machines are suited for a host of applications, including automotive, aerospace, defense, medical, computers and electronics, telecommunications, consumer goods, and so on. This discussion will examine what’s possible with smart machines, and what tradeoffs need to be made to implement such a solution.