I make this decision on almost every project my team works on. Our policy is to do a full test pass on the final FW and HW revisions before we release a product. The reality is that we are often pressed for time and must make a risk assesment on the "small change" that the FW team did at the last minute. We often choose to do a focused regression pass, and not the full test sweep; but we do it knowing we are taking a risk, and make sure that the project team agrees. We build industrial products, not medical or military, so the risk is more contained.
I agree with you, the best advise I've ever gotten has been people telling me their horror stories. Many a time when stuck on a problem, I'd reflect on some similar problem I had been told and it would lead to a solution.
One time, just before I left for a customer's site across the country to install some software I had written, my manager had been telling me of an instance he had found the hardware engineer had addressed all the interface cards to the same location.
Later on the job site, my program was working, but every once in a while the system would mysteriously crash. I had to develop the software without the benefit of having a computer that ran the OS, nor did I have access to the hardware itself. Since I only had the spec sheets, I wrote the program very conservatively and since it was interrupt driven, I didn't know how long between receiving and processing the interrupt, that the background process would get control to deal with the data. So I placed everything in ring buffers.
After the program crashed again, I checked the buffers and found status values that should not have been there. Remembering the problem my manager had, I chased down the hardware engineer and had him check the interface board addresses. They had several modems in addition to the serial interface I was talking to. It turned out one of the modems had the same address. When I'd query the hardware, more often than not, a modem would respond instead of the serial interface.
What was crashing the system was that eventually an interrupt would be missed and the system would hang waiting for a return that would never come.
I doubt that it would occured to me to have the addresses checked if my manager hadn't just happened to mention a similar problem he had solved.
Friend of mine worked on missile development somewhere in the world. When they did a small navigational software update they re-ran all the tests with the missile attached to the aircraft wing - on the ground. When a customer fired one of these missiles in a war zone (cant go into details) it went ballistic, hit something it shouldn't have, and caused a huge diplomatic fuss. Investigation showed that with a strong crosswind, the navigation loop didn't complete before the it was interrupted by the next sensor update, and the missile effectively got lost. An air test would probably have shown up the fault. Moral is , as NASA always say, test as you fly, and fly as you test.
I don't know why I didn't send the technician back to test the entire circuit card. I guess I just thought that the simple timer circuit could not have affected other circuits. I got really chewed out for that mistake. Needless to say, I never made that mistake again. That was fairly early on in my career and the first time I had ever redesigned a circuit in a piece of flight hardware.
I'm sure that there are plenty of engineers who have made mistakes that they will never tell anybody about. Telling this story can't affect my career as I'm retired, but it might prevent some young engineer from making the same mistake.
Regression testing is necessary for everything. Rockwell Automation routinely delays firmware releases because of regression test failures. One can grumble about the delay (and I do), but I am also very appreciative that Rockwell DOES do exhaustive regression testing.
When I was with a semiconductor company we used to see similar issues as the components went through a die revision. the new part would meet all the specifications of the old part, but somewhere done the line there was a circuit using an unspecified feature of the old part and suddenly their circuit wouldn't work any longer.
The worst case example is a customer I had that used the part above the absolute specified supply rating. He tested the parts and burned them in and was happy accepting a small failure rate because it saved him buying a regulator on his board. A new die revision no longer allowed him to be as frugal.
Andrew, I am really suprised at the testing done on your change. I worked on spacecraft and other military systems. Any change would require a complete regression test, especially in a weapn system or man critical system, in my experience.
Wal-Mart will hold its second Made in the USA Open Call July 7-8, at its headquarters in Bentonville, Ark. The event will be a repeat effort by the world’s biggest seller of consumer goods to increase the amount of US-made products it sells in Wal-Mart stores, in Sam’s Club members-only wholesale outlets, and on walmart.com.
From design feasibility, to development, to production, having the right information to make good decisions can ultimately keep a product from failing validation. The key is highly focused information that doesn’t come from conventional, statistics-based tests but from accelerated stress testing.
There’s a good chance that a few of the things mentioned here won't fully come to fruition in 2015 but rather much later down the line. However, as Malcolm X once said, "The future belongs to those who prepare for it today."
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.