Thanks for the study, Brent. Was this an Hypothesis-Driven study or a Data-Driven study? I'm a scientist, not an engineer, but I do use spreadsheet calculations to do quick statistics on very small collections of preliminary data. To analyze experimental results I go a bit deeper and use either MathCAD or Mathematica depending on the preference of my collaborators. For measurements and signal processing I use test-driven software development guidelines to produce code in National Instruments LabVIEW. I use a spreadsheet to document quick calculations that I used to perform with a handheld calculator.
I would hypothesize the use of spreadsheets to analyze mission-critical data would be widespread in business and financial applications. Is it that widespread in science and engineering?
You have to keep in mind that creating a spreadsheet is much like a creating a software program. You wouldn't trust a program without running test data through it, and you really have to do the same with a spreadsheet.
Thank you very much for the very informative article, actually I have some bad experience with spreadsheets, But my mind did not like to believe that it is with the spreadsheets fault always in my side. Now I know what has gone wrong.
Tekochip yes I also with you but my problem is when we changed the data sometimes sum functions has changed without knowing me. Is there a any method of locking the formulas after checking the test data.
Ninety-one percent of spreadsheets contain errors? The nice thing about handheld calculators (and sliderules before that) was that the engineer was more connected to the calculation. There's a subtle disconnect when doing the calculation in software on a spreadsheet, which makes it easier for a resulting number to gain a couple of zeroes and still not be noticed.
Good point, TJ. To use your analogy, a nail that's bent in some insignicant way could still be a functional nail. But if we are unable to recognize the difference between a significant problem and a small one, that's where it becomes problematic. The question I obviously don't the answer to is, how many of those 91% would be signicant problems?
However, Lifecycle Interests does not seem to host the e-book on its own website. The book instead is hosted at the web site of this blog's author, PTC. The author's company does not host its own work?
One has to question the source for the conclusions of the blog (and by inference the book as well).
I don't necessarily disagree that spreadsheets can cause errors to propagate because they tend to hide the equations. It does not negate their incredible usefulness. Other commenters have already noted the need for testing and checking. ANY tool needs that. The software being pushed by the blog author needs it just as much. It may make the process simpler, and add some helping hand, but it also needs to be checked.
I would question the source if I did not know it to be true. However, if we limit the discussion to the primary spreadsheet program, MS-Excel, there are known documented issues with the functions built into Excel. Random is not random, stdev is not the true standard deviation, linear regression is not least squares fit..... The list of statistics errors in Excel is legion. The frustrating thing is that MS has known about these errors and has not fixed them. Even when they implement a fix in a new version it turns out to be broken in a new way.
Microsoft has clearly abandoned a great market in Engineering and Science and only seems to care if money adds up correctly. I suspect the monetary errors listed in the article were user generated. The statistic function errors in Excel are built in.
As far as my background goes, you can see it here: http://www.lifecycleinsights.com/about-author/. I blog and write about the technologies that engineering organizations use to design and develop products. My company providers services for engineering organizations (http://www.lifecycleinsights.com/services-eng/) as well as technology providers (http://www.lifecycleinsights.com/services-soft/).
I blog at a number of places, including ENGINEERING.com (http://www.engineering.com/DesignSoftware/DesignSoftwareArticles/articleType/AuthorView/authorID/134769/Chad-Jackson.aspx), where I provide some frank insights on the direction of these types of technologies. I also blog at the new NAFEMs blog (http://www.nafems.org/blog/). I also co-host a webshow called Tech4PD (http://www.engineering.com/Videos/Tech4PDShow.aspx).
I've included this detail because you seem interested in what I do professionally.
I've written about a blog series on the engineering notebook and how calculations fit in (http://www.engineering.com/DesignSoftware/DesignSoftwareArticles/ArticleID/4682/Calculations-and-the-Engineering-Notebook.aspx). That led me down the path to write the eBook referenced in this post. I developed it independently. PTC licensed the publication and is hosting it on their web site for access.
As far as my site, lifecycleinsights.com, you'll find that none of my publications, presentations, web shows and whatnot are hosted there. By no means am I trying to start a media company like Design News. Instead, companies license my content and host it elsewhere like PTC or ENGINEERING.com for their own purposes.
I hope that answers some of your questions. Let me know if you have others.
Mr. Jackson, I have absolutely no problems at all with your credentials.
I was not clear enough when I questioned "the source".
My problem is with a guest blog that is in actuality an advertisement for the author's company.
I pointed out the host-location of your e-book because it strongly implies to me that PTC paid for the research and book (thus owning it). YOU note the licensing of it, but Mr. Edmonds of PTC did not. the way he presented you as author showed absolutely no linkage at all.
Mr. McDermott, thank you for bringing this to our attention. Per Chad's post, he developed his thoughts and e-book independently. He approached us about licensing the content, which PTC agreed to. This relationship is disclosed in the e-book itself. I didn't mean to mislead anyone by not fully disclosing the relationship directly in my original post.
Years ago, when slide rules were still found in desks and calculators roamed the earth, my mentors gave me this advice:
"If you find that you can't do your initial estimates with a basic scientific calculator, STOP. If you find yourself digging out your textbooks and writing equations, STOP. --You are probably about to make a huge mistake.
"Someone before you has probably seen this problem and has an acceptable plant equation or rule of thumb that you can use. If you reinvent that wheel, the chance that you will make a mistake is very high. You should use those text books to validate or invalidate an approach, not to create new ones, unless the old approaches are clearly inadequate somehow and you are really treading on new territory."
And in the years since, I have used those textbooks to create new approaches, but only in a handful of cases.
The point about spreadsheets is much the same: If someone has written and validated software, USE IT. The chances of making a mistake when modeling with a generic tool such as a spreadsheet or writing your own modeling software are VERY high.
As engineers we have an obligation to our employers, clients, and to the public to get the correct answers. If there is prior art that you can use to help you in this process, USE IT.
There is a skeptical side of most engineers that demands to verify everything and assume very little. Last time I checked - (unless the spreadsheet is locked) formulas are easily visible and we can see how values are being calculated. I prefer that scenario over feeding values into a software "black box", shaking it a bit and then accepting the answer.
I've been bitten by flawed spreadsheets more often than by other analytical software tools, but what makes me particularly suspicious of spreadsheet based calculations is that they don't do unit checking and that, due to not making calculations visible by default, they can be easily 'broken'. Additionally MS doesn't always follow the standards about handling extremely large/small numbers and NaNs.
The analytical process I've developed is three tiered approach:
For basic calculations that involve developing a simple correlation I default to using spreadsheets. I limit these analyses to simple plotting, low order curve fitting and basic statistics. I make a point of limiting the amount of complexity in any given calculation, show every intermediate result and make liberal use of named fields so the equations look closer to what I'd write on paper.
More sophisticated analysis that can be well approximated by closed form solutions are conducted with tools such as MathCAD.
The balance of calculations I generally conduct in Matlab or Octave. These tools don't provide units checking and may not be as readable as something like MathCAD. To address this I've adopted a lot of techniques used in software coding standards:
Variable names are constructed to communicate units, whether they are constants, their scope in the context of the analysis, etc.
Calculations are formatted to maximize readability.
Every step in the process is ether made obvious by the preceding bullets or is liberally commented.
My error rate has dropped precipitously since adopting this process.
I fully agree that a lot of calculations entered into spreadsheets contain mistakes. There are many points raised by the previous comments that don't clarify things too much. From my view:
1) 22 spreadsheets is a small sample. In only one of the studies cited at the link given in the article are more than 30 samples done (note--there are two "interviews" cited which note approximately 36/year samples audited; however, the references are anecdotal. The point is only one study had a large number of samples and in that sample the error rate reported was much lower (11%).
2) The sampling and measurement methodology are considerably different across the studies so the aggregate statistics are useless.
3) I agree with the comment that spreadsheets are like software. Internal software is frequently not rigorously validated either, whether it is for engineering or business purposes.
4) The article does not consider errors in input to "validated" code. In my egineering experience, frequently there are errors made in setting up complex problems to use commercial code. Just becuase the software is "validated" does not mean it can validate all inputs. Garbage in, garbage out still applies.
5) The comment on errors in the Excel software are referring to a completely different issue; although serious, they are out of place relative to this article.
6) Again, agreeing in principle that errors in spreadsheets are commonplace, and sometimes serious, I still must point out that the type of calculations carried out by many codes sold by PTC are far beyond what is done in spreadsheets. Thus, this is not a realistic comparison.
7) Spreadsheet programs are as much a tool as a pencil and paper, a calcualtor, slide rule, abacus, or anything else. They are very useful in skilled hands. As I was told once, "a poor craftsman blames his tools".
8) The comment about checking test cases is right on. Most problems put into spreadsheets have easily defined limiting cases (certain values go to zero or one, etc.) and test cases should be run. The beauty of spreadsheets is that you can store the test case data and outputs with the code for posterity, even in a sheet in a workbook. This is a very convenient way to maintain validation. For comparison (and fun) ask a software vendor to show you their validation plan, their test cases, their results, all tied to code version. Go grab some coffee as it is then going to be a while. The point of this is that if you don't get clear test cases from a vendor, and verify you are running the same code, then you should run test cases through commercial engineering software to ensure it does what you think it does, or you are no better off than you were with spreadsheets.
I've used Mathcad since version 4.0. I've also used MS Excel spreadsheets for many years for most if not all of my engineering calculations.
Garbage in, garbage out. Errors, errors, everywhere. Good engineering is being cognizant and meticuluous in every thing you do. What else can be said? It comes down to how easy it is to set up your calculation structures and how easy it is later to evaluate them later. Then, there's the issue of what kinds of calculations you want to do.
One of the great advantages of Mathcad (now belonging to PLC) is that it is a very visual means of setting up your calculation. The use of standard mathematical script and structures makes understanding the easiest of any other means I know. I use Mathcad as a test calculation method if some result is coming back wild and fishy. Sometimes you just need a hand calc. But, this is where the advantages of Mathcad stop. Early versions were difficult to arrange complicated setups, newer versions are somewhat better (I use Mathcad 14 now). It is not easy to learn how to make the structures look like they do in mathematics, no matter what the sales people say. I've got years with the software and I understand its limits.
Spreadsheets have one inherent advantage over Mathcad. It is FAR EASIER to make hundreds of parallel calculations. That is the main use of Excel. Lots of calculations over a wide array, easy to see the results, but the guts of the operations are not so easy to ferret out. Excel has added a lot of functionality in recent years including adding more math functions and GoalSeek which added a missing dimension to engineering calculations with spreadsheets. I miss having the Visual Basic editor to get my own solver routines to work in the background. Recent changes mistakenly called improvements have honked this up.
If Mathcad is offering their cut-down trial software for free in perpetuity, take it. Free tools are free tools. Mathcad is neat, but they need to work out their interface (clearer directions, tutorials) to have a chance against the raw simplicity of a spreadsheet. Mathcad has a host of weaknesses in that its not easy to find out why something is not working when you try to set up the formula. Mathcad has a long road ahead if it wants to challenge spreadsheets.
A few things that helps cut down on errors in spreadsheets:
(1) clearly show which values are inputs. I use bold text in yellow boxes.
(2) don't try to jam all of your expression in one box. just because you can, doesn't mean you should. It is just as easy to break a complicated expression down into parts or even do it in seperate steps. By taking on your calculations in steps using seperate cells you can have a better chance to see where things get honked up along the way.
(3) run test cases before you do all of your array. some of the techniques used in earlier code writing still are relevant in spreadsheets.
just some thoughts, there's better ways of using spreadsheets. Mathcad is a better way to visualize. Use whatever you think is best.
I have avoided using excel for exactly the reason that it is such a huge pain to enter equations and it does seem to be quite user hostile. What is useful is an actual manual on how to make things work, not some useless help file that never ever thinks in the same ways that I think.
I have to agree that for other than the most straightforward calculations, Excel is not the way to go. I have had two or three bad experiences with MS Excel and use other software programs where needed. For statistical work, I use Minitab exclusively. I feel it's the very best application software for doing six sigma determinations. Excellent article Brent.
Digital healthcare devices and wearable electronic products need to be thoroughly tested, lest they live short, ignominious lives, an expert will tell attendees at UBM’s upcoming Designers of Things conference in San Jose, Calif.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.