The Case of the Ridiculous Fatigue Calculations
September 23, 2011
In the early 1990s, our firm was hired to design and install several piping-fatigue monitoring systems on large-diameter gas and oil gathering piping in an arctic environment. The data acquisition system had to continuously acquire signals from 40 to 45 strain gauges at up to 500 samples per second, determine when a significant event occurred, then rain-flow the events to estimate fatigue damage. An indicator was generated if fatigue usage rates exceeded a 25-year time-to-failure.
The computational hardware available at the time was an 80386-based PC (12 MHz!) running MSDOS and various tutti-frutti memory and disk optimizers. To get the required channel count at a reasonable cost, an ISA card 12-bit A/D was used with an external 64-channel multiplexor board. This board had a cascade of 4-1 multiplexor chips that, when properly addressed, would select a channel, feed it to the A/D board, request a conversion, then switch to the next channel until all channels were scanned. A short ribbon cable was used to supply the necessary timing and control signals from the A/D board in the PC to the multiplexor board. Since we only needed data at a few hundred hertz, the 20 MHz scan rate of the multiplexor provided essentially a simultaneous scan.
Strain gauges were attached to piping, wiring was run, amplifiers were installed and calibrated, and the computer hardware was installed in various control rooms. We had the systems operating by late November, but by January, one of the monitors began to throw unrealistic fatigue estimates. A review of the data streams showed that at seemingly random intervals, several of the channels would exchange places. The vagaries of the fatigue calculation resulted in ridiculously high numbers for some locations, and near zero at other locations.
Our initial investigations focused on the various bits of software that unwrapped the multiplexed channels. To break the 640KB memory "barrier" in DOS (we needed a whopping 1.5MB of code and data space), there were several layers of memory extenders and virtual memory hacks whose memory addressing schemes were notorious for being one of the usual suspects. Not this time.
The next most likely culprit was the multiplexor board. This suspicion was intensified when a further data review indicated that the channels were rotating in groups of four channels -- which was the number of inputs for each multiplexor chip. A logic probe finally showed that occasionally, a spurious control signal would occur that caused the multiplexors to advance without the A/D board's knowledge.
But what was causing the glitch, and why on just this machine? The board vendor claimed it had never heard of such an issue, so that was no help. We started thinking that something in the hardware was bad, so we exchanged monitoring hardware, but the problem remained at the location, not with the hardware. We suspected there might be some local electrical noise, so we tried various shielding arrangements within the PC and the ribbon cable but were unsuccessful at finding the problem. Alternative grounding schemes for data and power supply seemed to help at first, but the problem ultimately returned.
In desperation, we decided on a workaround. We tied two spare adjacent channels in one group of four to +5v and to ground. We wrote a subroutine (we called it MuxFux) so that for every scan, the voltage of these channels could be checked to ensure that the multiplexor hadn't slipped a cog. If we discovered an error, that scan was discarded and the board was reset.
While implementing this solution, we noticed that the multiplexor error occurred only when someone walked into the control room from the arctic outside and took off their Parka! We subsequently proved that we could initiate the error at will by vigorously shaking a nearby Parka. At least we had a cause, but short of banning Parkas, we could not eliminate the issue.
Although not ideal, the software workaround proved satisfactory, and the monitoring systems functioned correctly until they were decommissioned years later.
This entry was submitted by Stephen Price, a senior staff engineer at Engineering Dynamic, and edited by Rob Spiegel.
Tell us your experience in solving a knotty engineering problem. Send to Rob Spiegel for Sherlock Ohms.
About the Author
You May Also Like