3. Jumping to conclusions. In failure analysis, comparing parts or assemblies that failed with ones that didn't can be a very valuable exercise. However, there is a temptation to latch onto any difference and immediately assume that it was responsible for the failure. You may even have a reasonable-sounding theory, which explains how this difference could have caused the failure.
Unfortunately, the fact that something sounds reasonable is no guarantee that it is true. You can't determine the root cause of a failure by simply thinking or talking about it. You need to test your theories against reality. Without data to back it up, a reasonable-sounding theory is just a reasonable-sounding theory.
4. Collecting data instead of thinking. Instead of jumping to conclusions, some engineers go to the opposite extreme: They try to perform every test and collect every piece of data possible, regardless of how relevant it may be to the problem at hand. Of course, this provides them with a ready response when managers ask why the problem hasn't been solved yet: They're waiting for test results.
Clearly, it's important to have data, and too much is better than not enough. But before performing a test, ask yourself: What information do I expect to get from this test? What questions will this information allow me to answer? What are the limitations of this test? Thinking clearly about your test plan before you start testing will help you to stay focused on the root cause.
5. Throwing the kitchen sink at the problem. Often, there are competing theories as to why a part failed. Based on these theories, different engineers may suggest three or four possible changes to the design or manufacturing of the part, which might (or might not) fix the problem. Pressure from management to "just get it fixed" might mean forgoing testing to determine which theory is correct, implementing all of the changes at the same time, and hoping one of them works.
Sometimes, this results in the problem going away, and the engineers look like heroes to management for solving the problem in such a timely manner. But really, nothing has been learned. Worse, in the tribal knowledge of the engineering organization, one of the changes, which actually had no effect, may undeservedly get the credit for fixing the problem. Once this becomes part of the conventional wisdom, it may prove very difficult to dispel. Learning the wrong lesson from a failure can be worse than learning nothing at all.
I always enjoy your posts, Dave. They really take on a problem and put into real world context without any of the hype or confusion that can cloud the issues. This is a perfect example. Maybe not any "rocket science" takeaway here, but some real sound advice for engineers on the common traps most fall into when trying to come to the root cause of a part failure or better yet, avoiding failure to start with.
Although I haven't been involved in formal failure analysis, I have often been called to troubleshoot problems. Often the most senior person involved 'declared' what the root cause was. After I finished my troubleshooting, I had often proven that the 'expert' was wrong. Seniority doesn't automatically mean that you know all of the intricacies.
I agree with Beth. Dave, thanks for such a clear overview. The principles you discuss here seem simple and obvious in hindsight, yet somehow can be easily forgotten even by well educated and well trained pros. They parallel\ some of the basic electrical system troubleshooting principles I learned from one of my engineer buddies years ago, which I apply mostly to my multi-component stereo system.
@Beth: Thanks for your kind comments. You're right that none of this is rocket science; it's just rational thinking. However, when parts break, people understandably get upset. Emotions can run high, and there may be a tremendous amount of pressure. As Ann points out, under these conditions, even intelligent and highly educated individuals may start to behave irrationally. The most important thing is to stay calm and focused -- especially when others aren't.
@GlennA: Experts, in particular, are susceptible to the temptation to jump to conclusions. The more experience you have, the more likely it is that a given problem resembles one you have encountered before. But that doesn't necessarily mean it's the same problem! Sometimes experience can be just as blinding as ignorance.
@TJ McDermott: You're right that sometimes time constraints can force you into a "kitchen sink" response. However, in these cases, it may be a good idea to continue investigating even after the "kitchen sink" solution has been implemented in order to determine the real root cause. Who knows? Maybe you can make yourself look like a hero for a second time by coming up with a cost savings when you realize that 2/3 of the kitchen sink solution was unnecessary.
These principles also hold true for failure analysis in electronics. I worked for a semiconductor company for years as a product and test engineer and recognize most of these scenarios as having happened at one time or another. One of the most interesting places in a semiconductor plant is the F.A. lab which is usually where customer returns are evaluated. And of course when parts started failing on the production line, the first place everyone tries to blame is the test set - it never occurs to them that their process might have shifted...
I am so glad after read this article. I am an Engineer and faced this problem many time and this article really helps me to solve my mistakes. Everyone should read this blog.
Dave: No doubt this could be a case of finger pointing at its finest. I think the points you made are critical for engineering teams to sit back, take a deep breath and dive into the problem rather than attack it without a plan.
And yes I have been in all of the above situations. My favorite is getting a part in a box and being asked "why it broke?" Only the part is fully functional....
While all these were sound advice I personally still keep an open mind for problems that would be fixed quickly by one of these sinful actions. Countless times I have attached 100 probes and just measured data... and wala ten minutes later I know the solution.
It did bite me once when the issue was not design or a problematic part but rather EMI. See probes can make the EMI issue go away...
I liked the article as well. Currently I am find myself trying to get to the bottom line of a lot of failures. I find your article intersting because some of the things you suggest not to do are exactly what we are doing. Our focus tends to start by understanding how big is the problem. Not because we don't want to fix everything but more from the point that we don;'t have unlimited resources and we want to get the most bang for the buck. We tend to try and get data and group the failures into different root causes. And then do focus on if the part is to print. Quite often the failures are caused because the part is not to print. Once the part is to print and the variability is taken out the system then the root cause failure of the design can be attacked and improved. However, if the parts are not capable and can't be to print, it doesn't matter how good the design gets because you will still have problems.
On April 21, NASA launched a novel project, putting into orbit three satellites that employ an off-the-shelf commercial smartphone as the control system.
The legacy endpoint devices that control our critical infrastructure (utility systems, water treatment plants, military networks, industrial control systems, etc.) are some of the most vulnerable devices on the Internet.
In a switched-capacitor filter, capacitors and switches take the place of resistors and accurately reproduce the characteristics of continuous-time Bessel, Butterworth, and elliptical filters.
From Dell / Intel® New Paradigms in Design Work Scott Hamilton, vertical market strategist for Dell Precision workstations, 5/2/2013 5
Early in my career, I worked as a draftsman and remember the days of drawing on vellum with numbered pencils and Mylar with plastic lead. This was a fun experience in the sense that I ...
I've been using workstations for more than 10 years and love finding ways to get more performance from my system. With demanding professional applications that require more power each ...
A lasting memory from my first job as an engineer in an auto assembly plant is standing on hard concrete at six in the morning, vending-machine coffee clutched in hand, listening to ...
For industrial control applications, or even a simple assembly line, that machine can go almost 24/7 without a break. But what happens when the task is a little more complex? That’s where the “smart” machine would come in. The smart machine is one that has some simple (or complex in some cases) processing capability to be able to adapt to changing conditions. Such machines are suited for a host of applications, including automotive, aerospace, defense, medical, computers and electronics, telecommunications, consumer goods, and so on. This radio show will show what’s possible with smart machines, and what tradeoffs need to be made to implement such a solution.
To save this item to your list of favorite Design News content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.