3. Jumping to conclusions. In failure analysis, comparing parts or assemblies that failed with ones that didn't can be a very valuable exercise. However, there is a temptation to latch onto any difference and immediately assume that it was responsible for the failure. You may even have a reasonable-sounding theory, which explains how this difference could have caused the failure.
Unfortunately, the fact that something sounds reasonable is no guarantee that it is true. You can't determine the root cause of a failure by simply thinking or talking about it. You need to test your theories against reality. Without data to back it up, a reasonable-sounding theory is just a reasonable-sounding theory.
4. Collecting data instead of thinking. Instead of jumping to conclusions, some engineers go to the opposite extreme: They try to perform every test and collect every piece of data possible, regardless of how relevant it may be to the problem at hand. Of course, this provides them with a ready response when managers ask why the problem hasn't been solved yet: They're waiting for test results.
Clearly, it's important to have data, and too much is better than not enough. But before performing a test, ask yourself: What information do I expect to get from this test? What questions will this information allow me to answer? What are the limitations of this test? Thinking clearly about your test plan before you start testing will help you to stay focused on the root cause.
5. Throwing the kitchen sink at the problem. Often, there are competing theories as to why a part failed. Based on these theories, different engineers may suggest three or four possible changes to the design or manufacturing of the part, which might (or might not) fix the problem. Pressure from management to "just get it fixed" might mean forgoing testing to determine which theory is correct, implementing all of the changes at the same time, and hoping one of them works.
Sometimes, this results in the problem going away, and the engineers look like heroes to management for solving the problem in such a timely manner. But really, nothing has been learned. Worse, in the tribal knowledge of the engineering organization, one of the changes, which actually had no effect, may undeservedly get the credit for fixing the problem. Once this becomes part of the conventional wisdom, it may prove very difficult to dispel. Learning the wrong lesson from a failure can be worse than learning nothing at all.
I worked with an engineer who said "If I can break it, I can fix it." meaning if he can demonstrate the exact failure, he has a pretty good handle on what failed and how to keep it fixed.
@Tigertom: That's a very good point. For those of us who are materials engineers, there's a temptation not only to take apart assemblies, but to cut parts up so that we can look at the microstructure. We end up with beautiful micrographs, but the original part falls victim to the chop saw.
As you point out, it's very important to get all of the information you can before taking apart an assembly. Once you get to the component level, it's also important to get all of the information you can from non-destructive testing before proceeding to destructive testing.
More than once, I've been in the position of realizing that I wanted to check something on a part after I had already performed a destructive test on it. As Homer Simpson says: D'oh!
Can I suggest one more big mistake to add to your list?
6. Quickly dismantle a failed assembly. If you have an assembly that doesn't work, it's very tempting to take it apart to see what's broken. You probably have one or two theories as to what might be broken inside. But if you dismantle it and nothing is broken, then you're in real trouble. When you re-assemble, the chances are it will work perfectly, and you've destroyed the bug you've been commissioned to identify.
Instead, before you dismantle, get every relevant bit of information you can from the failed assembly. What are the resistances and capacitances at the terminals, or what is the frictional torque to move it, or how much does it weigh, or does it rattle when shaken etc. etc. If possible, x-ray. Develop a list of failure modes that could produce the observed symptoms, and see if you can prove or disprove any before dismantling. As you dismantle, measure the torque on bolts, look for dirt or misassembled components and for parts that have moved to unexpected positions. Once the disassembly is complete, all these clues will have been lost.
Dave. I couldn't agree more. Thanks for a dose of sanity. I too have been part of similar investigative teams. As noted, it seems that one of the biggest issues that pops up is getting management (or the customer) to be patient while the investigation proceeds. There are no shortcuts for a good analysis.
Great Article, Dave.I found myself thinking back to many different scenarios over the years, after reading each of your points, 1 thru 5.One which loudly resonates is touched upon in both your #2, andyour #5 – jumping to conclusions, and management pressure to fix it quickly. Many times, I have dealt with a manager who forced his suggestion to be the fix, without going thru the necessary trials to prove it.I preach again and again, "a sample of one doth not constitute a statistical lot".
Product failure analysis covers two different types of products, those that have been working properly for a long time, and those that don't have a history of having worked. The failure analysis of the two types would be a bit different, at least after the start. The first question would be "did it ever work correctly?", since if it did not, then the design may be suspect. But it is also possible that the design is good but the part was not made to the design. Amazingly, not every design is produced faithfully the first time.
The conclusion, then, is that in order to correctly understand why some part failed, it is mandatory to understand just how the system including that part was supposed to work. Having an adequate understanding of a system is seldom a trivial task, but it is important. A part will fail because it was subjected to forces beyond it's strengths. That is the fact in a majority of instances. At that point the question becomes one of: was the part made to the design specification, or was the specification adequate? Again, in order to be able to answer correctly there must be an adequate understanding of the system.
Interestingly enough, sometimes the problem is caused by there not being an adequuate understanding of the system from the very beginning. And I am not sure how to solve that problem.
One of the other key points I think when solving problems is not to focus on one area or not focus on one area. If you are in design don't automatically focus on if the part is to print and then point the finger at quality. If you are in quality don't ignore if the part is to print and focus on the design.
True problem solving is a skill that takes a lot of patience and discipline. You must let the data lead you but still be open to engineering decisions and insight. As well as remembering the problem is that the part is breaking. We are all together in trying to solve this problem. Not point fingers at who caused the problem.
On April 21, NASA launched a novel project, putting into orbit three satellites that employ an off-the-shelf commercial smartphone as the control system.
The legacy endpoint devices that control our critical infrastructure (utility systems, water treatment plants, military networks, industrial control systems, etc.) are some of the most vulnerable devices on the Internet.
In a switched-capacitor filter, capacitors and switches take the place of resistors and accurately reproduce the characteristics of continuous-time Bessel, Butterworth, and elliptical filters.
From Dell / Intel® New Paradigms in Design Work Scott Hamilton, vertical market strategist for Dell Precision workstations, 5/2/2013 5
Early in my career, I worked as a draftsman and remember the days of drawing on vellum with numbered pencils and Mylar with plastic lead. This was a fun experience in the sense that I ...
I've been using workstations for more than 10 years and love finding ways to get more performance from my system. With demanding professional applications that require more power each ...
A lasting memory from my first job as an engineer in an auto assembly plant is standing on hard concrete at six in the morning, vending-machine coffee clutched in hand, listening to ...
For industrial control applications, or even a simple assembly line, that machine can go almost 24/7 without a break. But what happens when the task is a little more complex? That’s where the “smart” machine would come in. The smart machine is one that has some simple (or complex in some cases) processing capability to be able to adapt to changing conditions. Such machines are suited for a host of applications, including automotive, aerospace, defense, medical, computers and electronics, telecommunications, consumer goods, and so on. This radio show will show what’s possible with smart machines, and what tradeoffs need to be made to implement such a solution.
To save this item to your list of favorite Design News content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.