Back in the '70s, I was working for Digital Equipment Corporation (DEC). I was sent to the field to troubleshoot a PDP-8/L that had intermittent problems. When I arrived, I was told that the local service people had showed up, found a non-working fan, which they replaced, and then found that the computer also failed diagnostics. But they couldn't figure out what was wrong.
This was back in the day of loading a bootstrap program from front-panel switches, and then reading the final program from paper-tape. One of my discoveries was that the core memory was getting corrupted. One of the simple diagnostics failed regularly at a certain point, and I could look back and find that corrupted memory had caused the problem.
I loaded the program again, and memory was corrupted even before the program started running. I loaded the program a second time, but stopped the loading midway. This time the memory was correct. When I let the load resume, the memory became corrupted. So the act of reading in the tail-end of the diagnostic corrupted a memory location that wasn't even being referenced.
MORE FROM DESIGN NEWS: Can You Diagnose This Engineering Mystery?
I repeated the standard swapping that had been done by the locals. I swapped everything -- the power supply, the core stack, every board in the computer. But the symptoms stayed the same. What was going on?
Later that evening, I was looking at a list of some of the memory locations that were getting corrupted. There was no pattern that I could discern: It was various data bits, and the addresses were all over the place, not sharing any address lines. But then I had the ah-ha moment (have you guessed yet)?
MORE FROM DESIGN NEWS: System Lockup Stumps Engineer
The victim bits were likely physically near each other. And that immediately led to the answer: The new fan (which was very close to the core stack) had a magnetic field that was large enough to disrupt the core memory. This field would occasionally combine with the half-select current in a particular core, and flip that bit to the opposite state.
The next day, I unplugged the fan, and all the problems went away. This same problem had puzzled engineers during the development of the computer, and a particular fan had been specified to solve the problem. Unfortunately, the locals didn't use that fan for the replacement on this project.
Tell us your experience in solving a knotty engineering problem. Send stories to Jennifer Campbell for Sherlock Ohms.