Still concerned about the printer display reading ink @ 50% , misleading the casual user ... questioning whether there was a sensor problem, code problem or just what ... sometimes those problems can be aggravating to the novice user (even lawyers, perhaps especially lawyers, get very frustrated at such things).
I once had a Microsoft Representative relay a story where the help desk received a frantic call from a distraught new user ... looking for the "ANY" key, 'where is the ANY key?? ... (where the instruction was "press any key") ... Hmmm :)
@lansrock dijo, "ya ponte a trabajar moran!". Se aparece que hay algunos que son de habla español. (It appears there are some here that speak Spanish.) Yo puedo hablar español und auch Ich kann nur ein bisschen Deutsch sprechen. (I can speak Spanish and just a little German.) What other foreign languages are represented among the attendees?
Several of you offered suggestions, alternative ways of debugging my daughter's printer problem. All good. All would have eventually solved the problem.
@danlafleur made a good point distinguishing between troubleshooting systems that used to work (simply find and replace the bad part) versus systems that are being developed and have never worked yet (meaning we still need to change/improve the design so that it will work.)
@cmeadows6959: Ideally, when an error is detected, the system should either fix it or halt. Don't let it die a slow death. Those are very hard to diagnose. "We hung because there was a bad byte yesterday morning." Error recovery files are good - it's another name/version for data collection that I'm a proponent of.
@JoeWojcicki: Yes, there is a need and a market for experienced troubleshooters. In my consulting business, I've done that for a few clients. But it's not PLCs that are being replaced. It may be in hw or sw. For one client it was faulty logic in a block in an FPGA that was causing data corruption. For another client, it was suspected that there was a problem in the assembly code. And for another one the hw timing changed causing the fw delay loop to change timing which caused problems.
@snandu13: I agree, that hw and fw engineers need to be able to work together. Not seeing eye to eye can be good if it is due to a different perspective, which is sometimes needed. But not seeing eye to eye is not good if it means not willing to work with each other.
@gpapich: The Five Whys would be good to drill down to the root cause but be careful that you don't start drilling in the wrong place. Sometimes a defect is not where we first think it is. The Ishikawa diagram is primarily used in the manufacturing or service industries once a process is in place. I'm not saying you can't use some of these techniques to debug a design but they are not formally used in that situation.
@raghu: Is it easier to debug digital systems or analog? For me, I have a definite digital skill set. So digital problems are easier for me. Analog systems would be harder for me but easier for someone skilled in that area.
@luizcosta: You are correct. It was not a good design to have firmware set then clear the bit, thus creating the opportunity for a race condition. A better design is to have firmware set the bit and then hardware clear it when ready.
I am a huge fan of explaining your problem to others to reach a solution; I can't tell you how many times I have figured something out while I was explaining it to someone. It didn't even matter how much they understood about it. Sometimes, if they have only a cursory knowledge of the subject it actually helps because you have to think more about every aspect of it in it's most basic terms.
I get the point too. Plan to get the design right the first time instead of using project time from something else because the work needs to be done over again. Courage to say " this needs time to get right".
Yeah, you are never going to be able to forsee all possible problems, but I thought that his point was that you have to make some time for troubleshooting, and the thought that you will just always have time to do it better later does not make for a good/successful business model.
We've been hearing about troubleshooting systems that worked before the failure(s). Debugging is tough because the system may not have ever worked and changing an endless number of parts doesn't improve the situation.
I came from a third world country where the replacements are hard to find...so troubleshooting skill is a must have...I do understand your point. You can't have a luxury of parts in front line and troubleshooting is a must.
@THasham, i tend to think that throw away maintenance is a wasteful and shortsighted idea. yes, techs and engineers may get out of the practice of understanding what is broken before changing the part with confidence. I also think what do you do if the replacement isn't available.
@danlanfleur...I can't put it better than you mentioned. Nowadays people do not troubleshoot anymore. They just know how to replace, and things are getting cheaper and cheaper everyday. Which make sense on replacing..but our brain get idle though. haha.
I agree with Ran on Ink cartridge. But I really like how Gary break it down into to process and narrow down the possibilities. We did these all the times and are used to doing these without analysing the process. or as our third nature.
troubleshooting seems to be a weak skill these days. Fault isolation is more than the process of replacing parts to make the problem go away. Skill, knowledge, experience and intuition are needed, and being persistent helps.
Gary's experience with simulating the printer engine is similar to me experience in testing Burner Management Systems (BMS). It's dificuilt to have a real operating natural gas industrial burner in the shop, so we built a simulator to act like one to the BMS. All the contyrols outputs and signals in looked like a burner.
@GStringham – About the ink cartridge problem... I'd like to offer you another step in your analysis. The symptom turned out to be a wrong reading of the ink level. Before undergoing the expense of changing the cartridge, it seems to me that the printer should have been rebooted. Printers are no more immune to an occasional bit-flip than any other piece of digital electronics. Neutrinos are flying through space all the time causing such havoc. Since this is really beside the point of your presentation, there's no need to reply to this one. Just an observation...
@raghu: How do we know failure modes in the beginning? Good question. We can't see into the future. But we can make good guesses based on past experiences. If we think of and prepare for as many possible failure modes as possible, then we will be prepared for the few modes that do occur.
@apdobaj: Six Sigma is used for manufacturing defects, where their building 1000 widgets and one of them wasn't made right for some reason. That does not apply to design defects. I am not aware of "formalized" debugging methodologies. One needs to know many techniques so that appropriate techniques can be applied at appropriate times.
@JM Ashcraft: I didn't replace the ink cartridge first because that is an expensive test if that was not a problem. Once you open an ink cartridge, the clock starts ticking for how long it will last because of trying to keep the ink from drying in the jets. So I only want one cartridge open at a time.
The whole system of 'defensive design' is to always have access, one way or another, to the key signals in your design. One of our products was designed by an outside consultant and they way he structured the op amps, we could not see the demod signal. It causes us no end of grief because we can now only see the signal after the gain and offset stages, not before, so we have limited ability to determine whether the bridge/sensor is working correctly.
@jl: Why did I not print the internal page first? I did look at the printer and saw that there was enough ink (which I found out later was not true.) Probably the main reason I started on the computer/software side was that it was the easiest test to run. "Print it again." Each progressive step required a little more effort so that is probably the main reason I went that way.
Slide 22 = KISS principle. German automotive engineering needs to pay attention here. Mechanics can't work on vehicles without very specific tools, procedures, excessive complexity. Lowest-level replaceable parts not very low level...2000 MB S430 Dash display fails? Replace the entire panel assembly ~$1400.00 later. Happened to my boss. I drive a Chevy Astro. Enjoyed gloating at him. We're friends, we still laugh...
Slide 21: Post-mortems very effective. We used them extensively in the navy T&E world for test scenario evaluations, any deviations from test plan, etc. At $100k/day for missile range time, we had to make sure we executed according to plan. Engineers and the project's value depended on it.
@pshackett: Thanks! My perspective was as a newbie tech-writer doing a service guide for a piece of robotics liquid handling equipment. I was fortunate that I was working with a veteran machinist, gearhead and company veteran. I used the 5-whys very effectively in that case to drive to the essence of why we did certain procedures in troubleshooting the controller to motor connections.
Test points, software hooks; all good things. But inevitably, after you've placed trap doors wherever you think a problem has potential, there will emerge an unforeseen problem. That's what we're dealing with here.
@slk -- I have found that if you work with the HW guys, the points that they bring up on the waveform viewers are the ones you really need. They just don't think to put them in because they have waveform viewers.
@jsh -- agreed, I was just commenting that I think that was the other person's intent with replacing the cartridge. I think the difference is 1) Always find the root cause of the problem regardless of how long it takes or 2) Fix the problem as fast as possible without the requirement of understanding root cause. It depends on the personality of the company you work for. I have worked at both kinds of companies.
I think the linear process worked fine in the example. My point is that the binary you suggest is not binary if you start at the printer end vs the computer end. It will still be linear. Binary you pick the middle.
I am debugging a bad register read problem right now. The first step is to poke at the design to see if it is the testbench or the design emitting the bad value. Then, if it is the design, check the read data mux, then the register value. If you debug it linearly, you are slow at finding the problem (i.e. search backwards from the pin through the spi through the top level, etc.,
The streaming audio player will appear on this web page when the show starts at 2pm eastern today. Note however that some companies block live audio streams. If when the show starts you don't hear any audio, try refreshing your browser.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.