Taking the Temp of Data Corruption

DN Staff

July 16, 2015

4 Min Read
Taking the Temp of Data Corruption

After working as an electronics technician for a year at a major defense contractor, I joined a classified program and found they were dealing with an ongoing perplexing problem.

Every member who joined the team was asked by the program manager to look at the problem and try to solve it, but only after the "appropriate" briefing by the design staff. Their presentation covered the gamut of design philosophy, specification and requirements, management and environmental concerns.

The circuitry delivering data to the product was based on a bus design incorporating TTL trapezoidal receivers. These receivers require very precise rise/fall times or the receiver would disregard the data as noise. The data was driven over single-ended lines from a card cage backplane to the product through shielded ribbon cables. Their open collectors required a very specific resistor pair configuration with one end to +5V and the other to GND.

One terminator was applied to each end of the cable to match the cable impedance. They had isolated the problem to the product and cabling since all direct un-switched data transfers around the product worked properly. Many attempts had been made to solve the problem using shotgun methods. The grounding and the bypassing of noise offered no relief.

The product would pass data successfully only when the transfer file contained less than 20 large blocks of data (each block being megabytes). The data failure always occurred within 20 to 30 blocks and continued on until the total transfer was complete. Data corruption was seen across all data bits. Resetting the transfer and starting over would immediately cause the error to occur slightly sooner in the next transfer. Errors were random, so we couldn't trigger on any event that would highlight the cause.

My only inquiry was to ask if they had detected a temperature problem with parts heating up after the transfer began. They assured me that was first on their list and they had long ago removed that from consideration as all parts were within spec, and the temperature inside the product was nominal with good supporting air flow.

If at first you don't succeed, read the manual. That seemed like the next obvious step, so I proceeded to check the specs of the parts involved in the design.

According to the manufacturer's spec, the outputs were to be operated at a 50% duty cycle. This design differed from that convention, as the data enable was active low during the whole file transfer (a 100% duty cycle). I then calculated the power across the resistor to +5V when the line was low and found it to be over 1/8W! This put the power dissipation at over the 100% rated power for the terminating resistor. My instructors in school had always harped on the philosophy that proper design never allowed for more than half the rated power ever being dissipated by any device.

Next, I measured the temperature of the terminating resistor during operation. I opened the product and started a transfer until it failed and then touched the top of the resister with my thumb to see if I could feel any heat. OUCH! It was hot. Even with the fans blowing across the part, it was very hot! I decided to replace it with one rated at 1/4W to see if that helped. I soon discovered that no one made a 1/4W package in that configuration, so I proceeded to find a supplier that could custom design and produce one.

Using manufacturers' catalogs -- we didn't have the Internet yet -- I discovered a manufacturer that could weld quarter watt resistors into any network and package needed and epoxy dip them for use in place of the 1/8W ones we were using. I called and they provided the needed quantity overnight at no charge. The next day, during lunch break, I took the parts and soldered them into the motherboard and the card cage backplane. I powered up and ran a test and found the problem had disappeared! No more errors. Further testing confirmed the solution.

Two things were learned:

1. The program manager had determined that diversity really works. Each person comes to the table with different outlooks and solutions to problems. Eventually someone will solve the problem.

2. Don't overlook the obvious. It may appear obvious that the cooling was sufficient and the part was operating in spec, but measure and check anyway. The devil is in the details.

Tell us your experience in solving a knotty engineering problem. Send stories to Chris Wiltz for Sherlock Ohms, and read more cases here.

Eric Windmueller works for a major defense contractor as a quality assurance systems engineer with 30 years of experience in electronics engineering and systems design. His experience includes working on two major systems for the US government from cradle (R&D) to grave (decommissioning) over a 20-year period. He currently supports programs fielded for the FAA and air traffic control.

Sign up for Design News newsletters

You May Also Like