In the projects I manage I have a rule that both software and hardware are guilty until proven innocent. Both groups are expected to work together to solve the problem. The SW guys can dismiss the HW team or vice versa if they feel the ball is in their own court, but one group can't claim the problem is on the other side and walk away on their own.
In the first week on this job I overheard some software guys discussing a problem. I went over an offered to help (it sounded like possibly a couple of shorted address lines). It turned out to actually be a software problem, but I gained a lot of respect from that incident. They were used to both sides pointing fingers at each other.
That's a good story! (and by a fellow Evans too...) I like how just noticing that it took a slightly longer time to fail from a cold start gave you the idea of how to narrow down the problem. I've never worked for a super-large company with all of those divisions, but I see how there could be communication issues.
At least all of your hardware came internally. I recall dealing with a hardware problem (not as severe as yours) when writing firmware, but the hardware was from an outside supplier who refused to accept that they had a problem. We eventually had to write a workaround in software. The old "can't you guys fix that in the software" solution.
It's very often difficult to determine if a problem is caused by hardware or software. I had a problem with an NXP processor that would crash whenever the I2C interface was turned on. It sounds like software, but when all other tasks were halted and only the I2C ran, everything was fine. OK, maybe I'm running out of execution time? No, no problem there either. NXP eventually sent out an errata that the Vdd bond wire in the chip had too large of a voltage drop and would crash when the processor was pulling a great deal of current.
Actually, I have seen this phenomenon in various other development situations, as well, except that you could replace "firmware team" and "hardware team" with "hardware module 1 team" and "hardware module 2 team". I think it's human nature (at least for the engineer/scientist humans) to make up your mind what the problem is, and then proceed as though your notion is the truth -- until proven otherwise. Which is exactly what the hardware engineers did in this case.
Jason, configuration control and documentation are critical to ensuring that a design is correct and can pass from prototype to production. When "patches" go undocumented, as you discovered with the evaluation board, then it becomes impossible to correctly configure the system. Software has dealt with this situation for some time. Frankly, I always thought hardware did as well. Just goes to show...
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.