Evaluation boards are a special subset of product, and should ALWAYS work, for ALL potential cases, ALWAYS. At least, IMHO.
The perception is that if the chip manufacturer cannot manage to get their own product to work (a simple demo board), how could I possibly get it to work in my own product. Eval boards are seen as "golden" designs and are often copied verbatim into OEM designs, so a flakey eval board is an excellent way to lose design wins (and possibly not just that one chip, but for a whole manufacturer's portfolio).
Kudo's for a good sleuth story. Sounds like the issue was handled well and AMD did the right thing passing the errata along. As a hardware Engineer, I'm used to debugging SW, so it was nice to read a story from the other (SW) perspective.
Thanks, Rob. To answer your question, yes, in 99% of all applications, I'm sure this AMD product worked fine. It was, after all, an evaluation board for a MAC-PHY chipset and an embedded x86 processor, so it never went into anything (at least of which I'm aware) that didn't have another engineer as the targeted end user. And eventually, once we'd found this little bug, AMD published an errata sheet for it, and word went out pretty quickly among their FAE's to drop that 10K resistor down to more like 1K, at which point everything worked fine in all circumstances.
For that one customer (and he happened also to be a customer of ours) who found the problem first, it was an insidious bug to chase, though. It landed in our lap mainly because we were the designer of an in-circuit emulator for the ES186 processor. And because the emulator added just enough capacitance to the line on the part that wasn't pulled hard enough, it looked for all the world like a bug with the emulator, since it was the addition of the emulator that caused DMA transfers to stop working properly. Then the fickle finger got pointed at the driver routines for the MAC/PHY chip (once the behavior had also been seen on non-emulator-equipped boards), since it looked for all the world like a software bug. After all, the bad behavior only arose on particular combinations of reads and writes to the bus and/or particular combinations of 8-bit versus 16-bit activities. And these failures were all so consistent that the customer's immediate reaction was to rule out "marginal hardware." Either way, since we wrote some of the driver routines, made the RTOS he was running, and designed the ICE; we were on the hook for it until proven otherwise.
Fortunately for us, this wasn't the first time we'd encountered a situation where a marginal board design failed on emulator attachment. (If memory serves, I described something similar years ago in another Sherlock Ohms piece.) So we knew where to start looking for trouble, and we had an excellent FAE supporting us on the inside at the time as well. (Chip Freitag is among AMD's sharpest applications people, although these days he's mostly writing firmware for them and doing a lot less directly with customers.)
In any case, this was one of those "1%" sorts of problems, and probably most of the people who used DMA transfers on these boards never knew this bug was down there.
It's an interesting case study, however, in how people use component values that "don't matter much." If you look at a design I've done, it's readily identifiable by the number of 4.7K resistors in it for all of those "doesn't make much difference" components.
Partly that's because I cut my teeth in electronics as a precocious Cub Scout in the early 1970's who go so turned on by the Cub Scout crystal radio project that for the next half dozen years every penny of his 50 cent a week allowance plus whatever money he could make mowing lawns (when he got a little older) went straight to the Radio Shack store that finally opened within the bicycling radius of a nine or ten year old. And not having much money, I always bought the "Surprise Pack" offering -- which was all of Radio Shack's surplus and tended to be well-stocked in 4.7K resistors.
Add to that the fact that the only real "coach" I had in electronics in the early days was Forrest Mims III, who wrote all of the $1.25 Radio Shack books on how to build op-amp or digitial circuits, and *he* used a lot of 4.7K resistors.
4.7K was also a good value for the kid with bad astigmatism who found 1K resitors (with brown and black color bands next to each other) a lot harder to read than 4.7K (where it's hard to miss the combination of violet, yellow, and red).
It was only when I got to college and started to learn electronics in any structured setting that I realized that 2*pi*4.7 equals something very nearly 30, so with common values of capacitors of the 3.3x10^x F variety, you can get to some very round powers-of-ten numbers for RC time constants.
So for all I know, somewhere out there in the world, there may be somebody looking at one of my designs today, seeing 4.7K ohms someplace, and thinking, "Did the guy who designed this run the numbers, or did he just eyeball it and say, '4.7K sounds about right?' "
I'm betting the guy at AMD who designed the Net186 board *didn't* buy a whole lot of Radio Shack "Surprise Packs" or build a whole lot of circuits from the Forrest Mims III "Engineers Notebooks" as a Cub Scout, either.
And to come back around and answer the question you asked -- I'd have designed this board to have a 1K resistor on that line from the beginning. There were other 1K parts on the board, so it's not like anybody would have had to load another reel on a pick-and-place robot to get them down there. And at 5V, the difference in current consumption for the board between 10K and 1K would have been 4.5mA, which is less than half the current than ran through the smallest LED on the board (and a whole lot less than an embedded x86 processor draws).
Philosophically, on an eval board, I would tend to design for the guy in the 1% category, simply because most times it doesn't cost that much extra, it's a design that's not going to mass manufacture, and when that 1% guy does come along, getting him back on track can take 100% of your applications engineering time. The 20 hours or so we spent chasing this problem down (and it was insidious, since it wasn't failing consistently, and it changed behavior depending on whether the first instruction after DMA release caused a read or a write from/to the processor bus) would have bought a whole lot of 0603 resistors.
Nice Sherlock, Eric. It brings up an interesting point. If this AMD product worked for most of its intended use, is that enough? Or, should the product be designed to hold up to a seldom-used difficult application?
The Smart Emergency Response System capitalizes on the latest advancements in cyber-physical systems to connect autonomous aircraft and ground vehicles, rescue dogs, robots, and a high-performance computing mission control center into a realistic vision.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.