I worked for a contract manufacturer building extremely sensitive sensors for a large customer on a build-to-print order. These sensors had a finite life, and they were replaceable in the field. We had received several production orders and were well into delivering them when we started to see field returns coming back. The reported defect was "erratic behavior in service."
In service, the sensors were mostly in stable environments with no inputs. The output, therefore, was supposed to be constant. When a sensor’s output was not constant, it was flagged as "erratic" in the field, removed from the system, and returned to us for evaluation and repair. A spare was then installed.
When we received a field return, it was inspected for visual damage and then put into a storage oven. Sensors were always stored at operating temperature so they could be tested more quickly. There was a long stabilization period if stored at room temperature. Engineering would review the reason for the return, look at the field history, and then prepare a test plan to confirm the reported condition (or not).
As time went by, we received more field returns. A pattern emerged -- we were able to confirm the failure in very few cases. Mostly, the sensors worked as specified during the evaluation testing, so they were then routed to receive a full final acceptance test and were reshipped. There were a few confirmed failures that did not have an obvious cause, so we did the simplest repair, replacing the external electronics, and retested to confirm that it was fixed.
Since we kept receiving sensors back, we looked more carefully at their field history. We discovered that in every case the failures occurred in sensors that were installed in "Gen 2" systems. By and large, they became erratic weeks or months after being put into service, but they never failed in "original design" systems.
The Gen 2 change was to be both a cost reduction and a reliability improvement -- point-to-point wiring was replaced with cable harnesses, which eliminated many hours of hand wiring and soldering, as well as eliminating human errors in wire-routing. But how would that cause sensors to fail? A good number of engineers -- both at the customer and at our plant -- were trying to figure this out.