Part of the fun of working for a development tool company is that you get to peek into other people’s designs when they go wrong. Often we would get a call when the customer thought the source of the problem was the tool itself.
Such a case occurred when a company that makes an embedded controller for a piece of heavy equipment approached us and complained, “I have a problem in my code, and when I went to install your in-circuit emulator, things got worse instead of better.” This was before processors commonly had JTAG ports for emulation.
The target processor was an embedded variant of the Intel 80386. The problem the user faced was that he had a bug that he knew was well into his own code execution, but he couldn’t reach that bug to get it out using our emulator. To him, the emulator itself seemed to cause the code to run off into the weeds. The bug was in the same spot every time he attached it.
The spot in question where the emulator consistently “failed” was the point at which the x86 processor made the jump from “real” mode to “protected” mode execution. The reset vector of the x86 was always “high” in memory, usually up near “all Fs” in address space. The processor begins execution in “high memory” in a backwards-compatible mode (“real” mode) for addressing memory and executing code. Because “real” mode is limited in its ability to page in and out of memory and implement multi-tasking, it was superseded with the introduction of the 80286 processor by what’s known as “protected” mode.
“Protected” mode allows for all of these additional safety features related to protected memory space once the user’s code has awakened in “real” mode and set up a few key variables in a “descriptor table” to allow it to shift gears. It then sets the bit that transitions between the two modes. But the ancient appendages of the x86 architecture seem always to be with us, so “real” mode (a vestige of the 8086 processor) follows us around to this day, even in advanced chips such as the Pentium.
Usually the transition process, seen from the point of view of where code is executed, involves starting at an address very near the top of memory, running a few instructions from code space very high in the memory map, and dropping down slightly in memory to finish setting up the descriptor table. It then makes a large jump down to low memory when the processor itself shifts from “real” to “protected” modes.
That makes perfect sense, Didymus7. With the ever increasing time-to-market pressures on design, I would imagaine this is a common problem. Each year (maybe six months), you take your product, add new competitive features, and shove it back out the door.
This was an excellent explanation of the problem and the fix. And I can tell you exactly why it happened. The root cause is most probably because the designers were not given enough time to test the design. The design completion date was not set based on proper design and testing, but more likely set due to a customer PO.
I do agree, but if the SOP was followed for developement, there is no excuse for this to happen. ISO certification works when the employees follow the deveopement paper trail. Cut and paste should be caught by the ISO plan. There is something to say about QC Controls.... It sounds like a total QC failure to me, and a slap on the hand for the engineers for not following the QC plan.
As several commenters here have pointed out, this could easily be traced to a human (i.e., personnel) problem. I suspect that these kinds of software/hardware glitches are commonplace, largely because companies too often can't manage to hang on to the design engineers who know why a product was designed that way in the first place.
A big chunk of what happens -- particularly in the modern world where "software guys" and "hardware guys" are rarely one in the same -- is that when somebody goofs up on one side (in this case a hardware guy undersizing a power supply), it's the guy on the other side of the wall who sees it first and diagnoses it through the lens of his own world view. And to a software guy, a piece of code that crashes repeatably on the same instruction every time smells like a bad tool, not a bad power supply.
At the same time, there seems to be this dividing line between the "analog circuits guy" (who in this case got the design responsibility for the switchmode power supply) and the "digital circuits guy" (who got the job of designing the processor core and attaching its peripherals). It was those two not talking with each other that produced the problem of "undersized supply" in the first place.
Where it's good to have an "old timer" around is where you encounter those problems that cross boundaries between "hardware" and "software" as well as between "analog" and "digital." Engineers like that were being minted 30+ years ago. Now they're not, and the only way they come into being is with the accumulated on-the-job experience of a few decades behind them. In theory, project managers should have enough visibility into their project's activities that they should also be able to spot cross-functional problems like this. But in the last few years there's been a trend for engineers who don't like the profession after a few years to go get the MBA and come back to the organization with a "management" hat on. And when that happens, you're about guaranteed to have projects run by people with neither the breadth or depth to solve thorny problems like this.
The good news in the United States is that at least there are a few engineers who put in the 25 years it takes to get good at the business before taking on project management responsibilities. There are countries in this world where societal pressure is such that you're perceived a failure if you're still "just an engineer" after 25 years and you haven't moved into management. And the net result there is that *nobody* in that society puts in the time to get really good.
I logged 25 years as a "pure engineer" myself before founding Focus Embedded. And the average age of employees here is 54. But we're something of an anomoly in that regard. That said, we also get all of the really hard problems that nobody else can solve because nobody has the breadth of experience to see the entire picture when a beast such as "all F's change to all 0's" raises its head.
Good points, TJ. I believe that inexperienced engineers may be part of the problem. Another is the rapid pace of new product development. So may products get updated on a schedule (think Apple). Engineers are tasked with adding new features with each release. It makes perfect sense that they would keep adding new features to the existing design without revamping the entire product design.
This is the risk companies take when they force out senior engineers for less expensive junior ones. The loss of corporate knowledge can be critical. This is not just a case of experienced vs. inexperienced, but that of knowing a product line well and remembering the reasons for design expansions.
Companies end up answering the question "Why is the design this way?" with the answer "Because that's the way we've always done it" and have no basis to support the answer.
This error is so understandable. The design works, so it gets passed down again and again. Why not? Why redesign? At a certain point developments in what the design is supposed to do outgrow the basic dependable design. This has got to be happening all the time.
From Dell / Intel® New Paradigms in Design Work Scott Hamilton, vertical market strategist for Dell Precision workstations, 5/2/2013 3
Early in my career, I worked as a draftsman and remember the days of drawing on vellum with numbered pencils and Mylar with plastic lead. This was a fun experience in the sense that I ...
I've been using workstations for more than 10 years and love finding ways to get more performance from my system. With demanding professional applications that require more power each ...
A lasting memory from my first job as an engineer in an auto assembly plant is standing on hard concrete at six in the morning, vending-machine coffee clutched in hand, listening to ...
A quick look into the merger of two powerhouse 3D printing OEMs and the new leader in rapid prototyping solutions, Stratasys. The industrial revolution is now led by 3D printing and engineers are given the opportunity to fully maximize their design capabilities, reduce their time-to-market and functionally test prototypes cheaper, faster and easier. Bruce Bradshaw, Director of Marketing in North America, will explore the large product offering and variety of materials that will help CAD designers articulate their product design with actual, physical prototypes. This broadcast will dive deep into technical information including application specific stories from real world customers and their experiences with 3D printing. 3D Printing is
To save this item to your list of favorite Design News content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.