It was on this exact address where the processor made the final jump from upper memory to lower memory (just after switching processing modes) that the target board with emulator attached lost its mind and jumped to a totally random address to begin executing garbage instructions. Because the problem occurred so consistently on that one jump instruction (and didn’t occur with an actual processor chip installed), the customer was convinced the emulator was at fault.
We had debugged plenty of code that made the real-to-protected mode transition with no trouble, so we were doubtful of a fault with the emulator. Yet the customer’s early setup code for the transition looked correct, so we were hard pressed to blame his code and tell him the problem was somehow in that portion of his code -- particularly since it did run on the actual processor chip itself.
Finally, in desperation, we asked the customer to send us his target system so we could replicate the problem -- which we did quite easily. But we noticed one thing the customer didn’t: The switching power supply on his board (which drove the whole processor and memory subsystem, as well as several peripheral devices) seemed awfully small for the number of chips on the board. It was so small, in fact, that we went back and redid a rough calculation of his power budget and found the switching supply to be nominally about 40 percent undersized.
We then took a hard look at the power rails on the board and found that they sagged just enough momentarily to cause the processor to run at below its minimum spec for Vcc with the emulator installed. Putting the processor in where the emulator was, we saw a similar sag, only not quite as pronounced. Ground also tended to “bounce” noticeably in both cases, but less so with the actual processor than it did with the emulator.
Working back, we cross-triggered a logic analyzer and an oscilloscope and discovered the point of “power sag” was on the second major jump instruction -- exactly where the code went wild when running under emulation.
The conclusion? On that second jump instruction, the switching supply, which was already huffing and puffing, had the job of changing the value on just about every single one of its address lines from “high” to “low” all at once. That’s a lot of signals simultaneously going from Vcc to ground.
I do agree, but if the SOP was followed for developement, there is no excuse for this to happen. ISO certification works when the employees follow the deveopement paper trail. Cut and paste should be caught by the ISO plan. There is something to say about QC Controls.... It sounds like a total QC failure to me, and a slap on the hand for the engineers for not following the QC plan.
This was an excellent explanation of the problem and the fix. And I can tell you exactly why it happened. The root cause is most probably because the designers were not given enough time to test the design. The design completion date was not set based on proper design and testing, but more likely set due to a customer PO.
That makes perfect sense, Didymus7. With the ever increasing time-to-market pressures on design, I would imagaine this is a common problem. Each year (maybe six months), you take your product, add new competitive features, and shove it back out the door.
I worked with aerospace software design engineers that expected their software to execute without power. In fact they specifically designed software for this condition. When there is the loss of electrical power in an airplane, which can happen any time, for a number of reasons, the electronic box needs to sense this and the software needs to execute shutdwon code for the few seconds left on the caps. Probably all data should be in nonvolatile memory all the time (but usually is not done that way), not registers.
When the electonic box starts up it needs to determine stable staus as soon as possible. Some times knowing the last state is at least somewhat helpfull. But again the box will not know the state that it is being power up in. There are some boxes on the airplane that get enought of the right kind of information to determine the mode of the airplane; ground power, engine power, taxi, climb, cruise, decent and land.
I had a running debate of the importance of immediately determining the present non-intermitent state of the box inputs so valid condition may be reported. The project manager insited in following series flight mode "states". They are sill having trouble because of this false series progression.
Yes, I think you're right, Chuck. The design engineering staff turns over and the new person simply adjust the existing design to accommodate (though not really accommodating) the new features. Then time-to-market come in as a pressure on design. The features keep getting added on to the old design until the design breaks down.
From my experience, Rob - I think you are correct in it happening all of the time. I look back at some of the test equipment that I worked on that was 15 years old and still expected to meet current test needs. The engineer was expected to add to the design to meet new needs rather than redesign the test set - all in an effort to save money. Every once in awhile we could shake our heads and go "no way, this just won't work" and get to do a total redesign - but that didn't happen very often. If it actually compromised the integrity of the product then we would insist on a redesign, but of course with the economy and lay offs being a fact of life back then - we didn't complain too loudly if we could make it work and it was "good enough." We did the best we could "within the parameters we were given."
Interesting story, Nancy. I would imagine that in certain critical areas -- like power supply -- you still had to test to make sure the existing design would work with the new features. That's the odd part about of this story.
Yours was a problem I saw all the time when I was in the development tools business. The only companies I've encountered where this issue isn't endemic are those where somebody takes the time, on a module by module basis, to write up a spec sheet for what the module in question will and won't do. Then there's a chance that somebody can produce the document that says, "This won't do that" when his boss says, "Just reuse this old design." That, I think, is more a comment on human psychology. But it seems critical that the entire conversation be reframed in the context of, "What changes do we have to make to the existing design in order to make it appropriate for the new design?" Asking that question assures that the matter is at least addressed, and if the answer is, "So many changes it'd be easier to start over with a clean sheet of paper," that message can be taken to management with some chance of success.
Nowadays I'm out of the development tools business, but it's not as though I don't reguarly see our potential client companies doing exactly what these heavy equipment people did -- largely because management gets it in its head that everything is infinitely reusable unless there's some internal procedure for triggering a review of "reusability" and some kind of document that says, "Ah... Come to think of it... No, not at all."
Note that above I say, "potential client companies," and not, "actual clients." Because where this kind of cavalier behavior is standard practice, it's usually because the company in question is penny wise and pound foolish. Running, as I do, a high-end design shop where we have a total *value* proposition instead of a message that "we compete strictly on price," I like for the self-selection algorithms for our customers to run in such a way that the guy who regularly reuses too much to "save money" is somebody else's client -- not ours.
It wasn't *that* long ago that we turned away a client who refused to spend the money to buy an extra week of engineering time in order to figure out how to make his system run better with about half as many components in it. Had he only been building a handful of units a year, not spending the NRE dollars would have made sense. But he was bulding 15,000 units a year, and 40 hours of engineering time would have saved $8 of BOM cost on every one of them. When you do the math, in one year, he'd have saved $120,000 (and had a more reliable end product, since it'd have fewer things in it to break). Or dividing it through anohter way, for NRE costs to have outweighed BOM cost savings, I'd have had to have been charging him more than $3,000 an hour for engineering time. And as much as I think Focus Embedded is a really good shop with the best product in the business, that rate is a tad high even for us... ;-)
You hit it right in the head, Rob - but you would be surprised at how much one can "scavenge" in a company that has been operating for years and has had a lot of test equipment designed and built. I remember on more than one occasion swapping power supplies around for that very reason...or IEEE cards, or memory...or video cards...sometimes we played "ring around the test set" to get the hardware specs we needed.
I feel your pain, Eric. Sometimes it just doesn't make sense that people can't see the obvious...but it happens all the time. Effective cost reduction does not equal inferior quality and sometimes you have to pay more up front to save in the long run, but sometimes you just can't convince people of that. I am with you - I would prefer that they are someone else's customers!
From Dell / Intel® New Paradigms in Design Work Scott Hamilton, vertical market strategist for Dell Precision workstations, 5/2/2013 3
Early in my career, I worked as a draftsman and remember the days of drawing on vellum with numbered pencils and Mylar with plastic lead. This was a fun experience in the sense that I ...
I've been using workstations for more than 10 years and love finding ways to get more performance from my system. With demanding professional applications that require more power each ...
A lasting memory from my first job as an engineer in an auto assembly plant is standing on hard concrete at six in the morning, vending-machine coffee clutched in hand, listening to ...
A quick look into the merger of two powerhouse 3D printing OEMs and the new leader in rapid prototyping solutions, Stratasys. The industrial revolution is now led by 3D printing and engineers are given the opportunity to fully maximize their design capabilities, reduce their time-to-market and functionally test prototypes cheaper, faster and easier. Bruce Bradshaw, Director of Marketing in North America, will explore the large product offering and variety of materials that will help CAD designers articulate their product design with actual, physical prototypes. This broadcast will dive deep into technical information including application specific stories from real world customers and their experiences with 3D printing. 3D Printing is
To save this item to your list of favorite Design News content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.