Is a scenario like this all too familiar?
A design engineer is struggling to place three probes on the appropriate pins of a high-density application-specific integrated circuit (ASIC) package in development test. Finally, he succeeds, carefully holding the probes in two hands, afraid to even breathe for fear of one pin moving against another and causing a short. Now the engineer discovers the oscilloscope connected to these probes isn't set to the right timebase setting to view what is needed. What to do?
The need to control test equipment to obtain measurement results without using one's hands or having to look at instruments is very real. Technologies are now being developed that will imbue test equipment with the ability to listen and speak to human beings. Driving this change are standards in operating systems, software, and hardware used by the PC industry. These standards allow the instrument designer to shop from a variety of components, adding value in areas where competitive differences can really be made, such as data acquisition hardware for an oscilloscope. A designer can take an off-the-shelf PC motherboard, add an operating system such as Microsoft Windows or Linux, use a development environment that runs either on the target system or a PC located nearby, and design a sophisticated application that controls the custom acquisition hardware for making the required measurement.
| This COM communications scheme for speech recognition applications is one way of separating recognition functions from the main application program.
Piggybacking on the PC. What makes this situation so wonderful is that the computer industry is spending enormous sums to create this base technology, while instrument designers stand to benefit from capitalizing on the inexpensive technologies generated by the demand for more powerful PCs and personal information managers (PIMs). For example, new PIMs and smart appliances are making available operating systems and hardware that boot up and turn on instantly. Such instantaneous "turn-on" is needed in the instrument world as well -- engineers don't want to wait a minute or two for their digital voltmeters to come up.
But much more than lip service has been given to ease-of-use over the years in electronic test equipment. Instrument manufacturers have tried myriad design tactics ranging from fewer knobs, on-screen menus, touch screens, and more knobs, to graphical user interfaces -- all hoping to discover that magic ingredient that will make their particular product so much easier to use than the competition. While many of these strategies have done well, they haven't necessarily addressed the context in which the equipment will be used. As the opening scenario serves to illustrate, with their hands occupied holding multiple probes, for several decades scope users have resorted to such creative alternatives as pushing a scope's buttons with their noses, chins, and foreheads. But touch screens and dedicated knobs haven't fit the bill yet in addressing this use model.
What may prove most viable to counter such frustrations is speech recognition -- touted as the next great leap in ease-of-use and user interface technology. However, while current speech recognition technology is advanced enough to do an effective job, many designers and users may still regard it as a foreign concept. Speech recognition generally connotes the dictation products on the market today, such as Dragon Dictate or IBM ViaVoice, which will generate acceptable results if time is taken to "train" them, and users are sufficiently disciplined to speak properly when using them.
However, such dictation products are much too complex and time-consuming for today's instrument users. The form of natural-language speech recognition technology that's more appropriate is "command-and-control," which uses limited vocabularies and minimizes the number of utterances to be distinguished. Applications already exist. For instance, mobile telephone systems enable users to command "call home" or "dial 555-1234."
Command-and-control recognition engines that run under Microsoft Windows and its variants are available from numerous vendors. Their software development kits come with the basic tools to build a speech-enabled application including documentation and code examples as well as tools for building the grammar to be recognized by the application.
There are two basic architectural approaches in building a speech-enabled instrument: Speech recognition can be "built in" as part of the main application, or it can be a separate program that communicates with the main application. Each approach has its merits, but the latter divide-and-conquer approach seems to make the most sense, especially if there is a multi- tasking operating system available with built-in multiple inter-process com- munication schemes.
One such scheme is the component object model (COM). With a COM interface linked to the main application, the speech recognition application runs on its own and gives instructions to the main application via the COM interface. The advantage: There is no need to tax the main application with the burden of speech recognition unless it's absolutely necessary (see diagram). And this COM interface might have other uses in the future, because who knows what new user interfaces might come along.
Concerns. No matter which architectural approach is taken, there are a few factors to consider: People often tend to speak quickly, and may give more commands than can be executed in one step. Consequently, what happens if the main application is still dealing with the previous command when a second is given? Or, what if a command is given that can't be executed because the instrument is in the wrong state? Even worse, what happens to a command that isn't recognized?
Another consideration is the design of the grammar itself. Command-and-control engines work best if there is a relatively small vocabulary. Care should be taken to avoid grammatical phrases that may be easily misconstrued, such as "move trace" being mistaken as "blue trace."
Localization is another issue. By their nature, command-and-control engines don't rely on "learning" the user's voice. Instead, they are initially tuned to a particular language or dialect -- a plus for users since the instruments will recognize their commands instantly. But, if an application is tuned for American English, it may have difficulty recognizing commands from a user with a strong Asian accent.
Oscilloscopes are prime candidates for speech recognition, as the opening scenario attests. The need in field-test equipment, on the other hand, may be for the instrument to vocalize measurement results for use in tight or dark confines, which is much easier than making it understand speech.
For this, some vendors supply speech-recognition technology known as "text-to-speech." These systems take an ASCII string of text and convert it to audible speech. The same code in an instrument that formats a string such as "The measured voltage is 5.6V dc," for display on an LCD could also send it to a text-to-speech engine and have the instrument speak the same message. Text-to-speech technology is improving to the point where computer-generated voices, like Hal in the movie "2001, A Space Odyssey," or the humanistic computers on "Star Trek," are almost here.
With natural-language speech recognition and text-to-speech, instrument designers have new tools that allow them to add an extra user interface to better match the context in which their instruments will be used. And test equipment users will welcome such interfaces as they find themselves requiring a "third" hand or a pair of eyes for the back of their head.
Mike Karin is an R&D project manager at Agilent Technologies (Colorado Springs, CO). He is responsible for the Infiniium oscilloscope line and focuses primarily on Agilent's high-performance line. Karin holds a BS in Electrical Engineering from the Georgia Institute of Technology.