Jason, configuration control and documentation are critical to ensuring that a design is correct and can pass from prototype to production. When "patches" go undocumented, as you discovered with the evaluation board, then it becomes impossible to correctly configure the system. Software has dealt with this situation for some time. Frankly, I always thought hardware did as well. Just goes to show...
Actually, I have seen this phenomenon in various other development situations, as well, except that you could replace "firmware team" and "hardware team" with "hardware module 1 team" and "hardware module 2 team". I think it's human nature (at least for the engineer/scientist humans) to make up your mind what the problem is, and then proceed as though your notion is the truth -- until proven otherwise. Which is exactly what the hardware engineers did in this case.
It's very often difficult to determine if a problem is caused by hardware or software. I had a problem with an NXP processor that would crash whenever the I2C interface was turned on. It sounds like software, but when all other tasks were halted and only the I2C ran, everything was fine. OK, maybe I'm running out of execution time? No, no problem there either. NXP eventually sent out an errata that the Vdd bond wire in the chip had too large of a voltage drop and would crash when the processor was pulling a great deal of current.
That's a good story! (and by a fellow Evans too...) I like how just noticing that it took a slightly longer time to fail from a cold start gave you the idea of how to narrow down the problem. I've never worked for a super-large company with all of those divisions, but I see how there could be communication issues.
At least all of your hardware came internally. I recall dealing with a hardware problem (not as severe as yours) when writing firmware, but the hardware was from an outside supplier who refused to accept that they had a problem. We eventually had to write a workaround in software. The old "can't you guys fix that in the software" solution.
In the projects I manage I have a rule that both software and hardware are guilty until proven innocent. Both groups are expected to work together to solve the problem. The SW guys can dismiss the HW team or vice versa if they feel the ball is in their own court, but one group can't claim the problem is on the other side and walk away on their own.
In the first week on this job I overheard some software guys discussing a problem. I went over an offered to help (it sounded like possibly a couple of shorted address lines). It turned out to actually be a software problem, but I gained a lot of respect from that incident. They were used to both sides pointing fingers at each other.
Thank you. There were many communication issues within the company. My group found out later that another division was using the exact same processor and was aware from the start of the memory timing issue. I don't know if that team was told about the problem or had discovered the modifications on the evaluation board before spinning their hardware. Ironically, the company prided itself on it flat management hierarchy leading to efficient communication.
I think many of the problems I encountered were much more than configuration control issues.
The company produced (and still produces) immensely complicated processor chips and other supporting silicon. It also produced (and still produces) motherboards, server blades and other subassemblies for retail label manufacturers. The period of when I was working for this company was during the 'Internet Boom' of the late nineties early aughts. The company was making acquisitions left and right in an attempt to become a major player in the internet hardware space. While the company had execellent configuration control tools and systems in place, the adoption and use of these tools and systems by these newly acquired entities proved to be uneven at best. The company also had a difficult time fostering a culture of cooperation amongst many of its new acquisitions.
The division I worked for had been a medium sized company in a shrinking niche of internet technology. (I joined after the company had been acquired.) There was alot of bad blood between the hardware and software groups which sadly was not properly addressed after the acquisition. The division which produced the probematic processor had been part of a once great technology company which was in steep decline. Morale issues as well as a culture clash with the new owners lead, I believe, to communication failures.
This is as much a story of a business trying to rapidly enter a market by buying as many components of that market as possible and failing to integrate them as it is a story about insufficient or missing configuration control.
Years ago whenb co,puters came in all kinds of flavors we worked with a rather enlightened software house. And of course there did arise failures to operate correctly. OUr agreement with the software folks was that as soon as we could adequately describe the problem to them, they would start looking as we were looking, and whoever found the problem immediately phoned the other to halt their searching effort. Thus the time wasted was minimal, and no fingers were ever pointed until after the project was successful And we did point out the faults in detail so that we could all learn from them. So all of us got better at what we did. And the wasted effort was kept to a minimum.
Shortly after I started at the company, the project manager asked me to sit in on a hardware design review. When I walked into the conference room, I was met with expressions of suspicion and contempt from the members of the hardware team. Apparently, software folks were not supposed to attend hardware design reviews. This was the complete opposite of my previous employer where hardware and software worked together from product inception to release. This tight working relationship, I believe, helped us avoid many integration problems. It also allowed us to build some extra "margin" into the design to deal with unexpected hardware or software problems.
In 1995 I was leading design of a PCMCIA modem based on a reference design and reference firmware. We made some modifications for safety, EMC, and to improve performance of the analog front end which weren't related to the problem we encountered.
When we got prototype boards delivered we installed the latest firmware from the chip manufacturer. Once in a while the cards would boot completely but mostly they would hang when installed in a particular type of computer. The chip manufacturer would admit to no problems. I puzzled over this for a couple of weeks. One day, while taking a break from the lab to catch up on other tasks, my manager dropped by and asked "Why aren't you in the lab? I want you in the lab full time until this problem is solved. What is the problem, anyway?"
We went to the lab and I was going to demonstrate the problem--and the card booted successfully. I immediately powered it off and restarted it--and it hung, but exhibiting a failure mode I had not previously seen. I immediately restarted it and if failed in the usual manner--several successive times.
This was my clue! I asked the manager to stay right there, walked to the next room and returned with a can of freeze spray. I heavily sprayed the modem chipset and the card booted successfully several times. At last I had identified a way to make the problem come and go. Definitely a race condition of some sort, but it wasn't even clear whether the cause (or the cure) was hardware or software.
Since the manufacturer of the chipset declined to release source code I had to get him to admit to the problem and address it. The first step was to send him a computer where the problem occurred. Then I had to get him to actually look at it. Ultimately this took the threat of "If we don't hear from you by Friday, we'll be on your doorstep Monday morning to help you with it." Late that Friday night I got a call at home. "Don't come! Don''t come! We've fixed the problem and are sending you a software fix." It turned out that one of the initialization steps was to read an unused bi-directional I/O register. Unfortunately the default power-on state for this register was high-impedance with the internal pullup resistor disabled until the instruction after the read instruction. The fix was to reverse the sequence of these two instructions.
Industrial trade shows, like Design News' upcoming Pacific Design & Manufacturing, deserve proper planning in order to truly get the most out of them as marketing tools. Here's how to plan effectively.
The series now can interface with a wider array of EtherNet/IP-compliant hardware across many industrial sectors, including factory automation systems, plastic injection molding apparatus, and materials-handling equipment.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.