Great sleuthing for sure. And I applaud the solution of using reference channels and checking software. That sounds a bit similar to one of the "data qualifying" steps that we once put into a system that could not tolerate being wrong, ever. It is a completely valid solution.
The problem sounds like a missing shield ground connection, most likely in the area of the multiplexer selection control lines, or perhaps in the selection signal logic area. Of course, it could also have been a "slightly bad" connection in the channel selection reporting area, which was probably in a less protected section of the board. That type of problem may also have been on the system backplane, if there was one. It would be interesting to look at that retired hardware now and see what the problem was.
I'd like to use this story (Extremely likely ESD was the source of the problem) for an upcoming Sherlock Ohms blog. You game? We would need your name and a short bio (two or three sentences) Also, it would help to give a bit of background on the setting -- what type of company, your role, etc.
If you're willing, please let me know at rob.spiegel@ubm.com
I appreciate everyone's comments. We, too, were not happy with abandoning the hunt before the root cause was found, but this was just one of many issues with these systems in that environment. Considering all of the constraints, we were forced to apply a bit of Engineering Triage and forge ahead.
I am kind of bummed the root cause wasn't discovered as well. I know most everybody is looking at ESD as the likely cause but with so many other other options it would be interesting to know. What I find most interesting is how it appears to be just the one postion that was affected by ESD. Any idea why just that spot or did I miss it in the article?
Extremely likely ESD was the source of the problem.
Had a similar problem with ESD on a system with just two boards mounted to a isolated common metal / unpainted chassis (connected directly together)... ESD to the chassis would corrupt the processor's operation and / or reset the system. Extensive shielding, grounding experiments, embedding all signals to inside layers on the pcb - with ground pours on external layers, etc.. nothing improved the system's ability to handle ESD events. The system just couldn't handle anything above 4kv discharges (we needed a min of 8-10Kv human body model for CE certification).
Final solution: a small amount of resistance (33-75 ohms) in series with all signals going between the two boards (except power / ground). Didn't effect intended signals (minimal capacitance on these lines). So... a bunch of very small SMT resistor packs (4 per 1206 size) were installed.
At the time, there were no small enough ESD devices, with low enough stray capacitance avalible to work in the space we had.
Apparently, the slight de-coupling (de-tuning/termination?) of these signal traces kept the energy (being dis-charged through the isolated chassis) from being picked up by the processor circuitry... and it is very hard to keep all grounds in a multiboard system at the same potential (with extreme rise times being involved).
After the change.. we could handle 15Kv!
Being where there is no humity (Artic or Arizona) I have seen ESD in excess of 100Kv. So even a well engineered system, can see extremes beyond excepted levels of abuse.
I would bet that you might find the computer case wasn't really grounded to the mother board (paint on the screw bosses), or the I/O board shield wasn't tied to the computer shield, or something like that.
I agree that ESD is the likely cause, and it MAY have been cheaper to just build a mud room outside the entrance, so the parkas could be left there... but you ran the chance that you'd still have unexplainable transient events, with improper grounding and shielding. ♠
Static discharge certainly appears to be the initiator here but I don't agree that it is the cause. Static electricity is a normal and expected part of our environment and, while we could mitigate it somewhat by adding humidification, banning parkas on the premises, requiring the use of static dissipative devices, etc., none of these really addresses the true problem, which is that for whatever reason the multiplexer circuit was not designed to tolerate the higher level of static found in the arctic environment. What the author did by programming in error checking and automatic reset made the system fault tolerant, which is what the circuit designer should have done in the first place.
Incidentally, I would like to thank the author for the spelling lesson. The use of multiplexor (with an 'o') raised an eyebrow and caused me to look it up. Turns out multiplexor and multiplexer are both technically correct spellings.
For 3D printing to make the jump from rapid prototyping to manufacturing, engineers will need to find easier ways to move products from their CAD screens to their printers.
Gigabit and PoE are two networking technologies moving ahead in tandem as industrial users power remote Ethernet devices such as IP security cameras at 1,000 Mbps over existing CAT5 cable.
New versions of BASF's Ecovio line are both compostable and designed for either injection molding or thermoforming. These combinations are becoming more common for the single-use bioplastics used in food service and food packaging applications, but are still not widely available.
From Dell / Intel® New Paradigms in Design Work Scott Hamilton, vertical market strategist for Dell Precision workstations, 5/2/2013 5
Early in my career, I worked as a draftsman and remember the days of drawing on vellum with numbered pencils and Mylar with plastic lead. This was a fun experience in the sense that I ...
I've been using workstations for more than 10 years and love finding ways to get more performance from my system. With demanding professional applications that require more power each ...
A lasting memory from my first job as an engineer in an auto assembly plant is standing on hard concrete at six in the morning, vending-machine coffee clutched in hand, listening to ...
For industrial control applications, or even a simple assembly line, that machine can go almost 24/7 without a break. But what happens when the task is a little more complex? That’s where the “smart” machine would come in. The smart machine is one that has some simple (or complex in some cases) processing capability to be able to adapt to changing conditions. Such machines are suited for a host of applications, including automotive, aerospace, defense, medical, computers and electronics, telecommunications, consumer goods, and so on. This radio show will show what’s possible with smart machines, and what tradeoffs need to be made to implement such a solution.
To save this item to your list of favorite Design News content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.