If you're a hard-core, dyed-in-the-wool gadget freak, Popcorn, Indiana LLC may have a design task for you.
The maker and seller of snack foods recently introduced the Popinator, which it calls a "fully automated, voice activated popcorn shooter." In videos, the device reacts to the spoken word "pop" by launching a kernel of popcorn into someone's mouth.
It's supposed to be a novelty -- a modern-day, mechanized version of the Pet Rock. And public reaction to it has been virtually off the charts. A YouTube video from the company (at the bottom of this post) has drawn 1.7 million hits. National and local television news shows have featured it, and newspapers around the country have written about it.
Popcorn, Indiana calls its Popinator a "fully automated, voice activated popcorn shooter." (Source: Popcorn, Indiana)
The problem with the machine is it doesn't work -- at least not in the way described in the company's video. "It does shoot popcorn, but it is remote-operated, not voice," Amanda Tiberi, marketing coordinator for Popcorn, Indiana, told us in an email. "It does exist, but only one, and it is a prototype in our office."
The video claims otherwise. "The Popinator uses a binaural microphone system, which is similar to the way the human hearing system works," says Ted, an electrical engineer seen in the video. "Basically, it is able to calculate, using the small differences in the arrival time of sound waves and their reflections, where a sound originated from."
Technically, Ted's description sounds good. But when a CNN reporter tried to use the device, it launched popcorn all over the floor. And it didn't react to the word "pop." It wasn't clear if the binaural microphones were even installed.
Of course, Popcorn, Indiana's idea -- post an embellished video on the Web, wait for it to go viral, and reap the name-recognition benefits -- isn't new. In truth, it's no different from what Volkswagen did this year with the Hover Car we discussed in June. In terms of plausibility, however, the Popinator and the Hover Car are in different leagues.
Randy Frank, an expert in sensing technology, told us it would be feasible for the Popinator to have a binaural MEMS audio sensor, which is often used in smartphones. Moreover, there's no reason the device couldn't find the source of the sound and launch a kernel in the appropriate direction. "Using a reasonably sophisticated software algorithm and some sensor fusion, you can make that projectile travel accurately to the direction of the sound."
The project could be expensive and could have its challenges. "Any time you do voice activation, you have to worry about false triggers," he said. "You don't want this thing shooting popcorn at your dog every time it barks, and that takes some engineering expertise. It's not trivial."
Can it be done? Popcorn, Indiana (which, by the way, is located in New Jersey) isn't asking for our help. But we encourage readers to weigh in with a comment. After all, who hasn't dreamed of owning a voice-activated machine that shoots popcorn into your mouth?
OK, I'll bite (pun intended). So, you want to launch a projectile, that has a bad habit of sticking in one's throat under normal conditions, at high velocity, into one's mouth, not an eye where the cornea could get scratched, while breathing in after issuing a command that opens the trachea exposing one's lungs to this projectile, and make it available to the youth of the household and your pets?
Maybe the better way is a video processing system with acoustic distance-determining subroutines?
Next, how about one that launches pins for those who sew a lot? Or nails for a carpenter who is putting up framing? Or syringes for nurses giving injections. The possiblities are endless!
One word: kinect
And then some more words(I am an engineer after all)
I think the kinetic sensor has what you need. It has a windows sdk now too. It will locate separate targets, judge distance, and even identify the mouth location. The trick will be figuring out how far above or below the shooter is by a calibration that would probably need screen and a interface.
I believe you're right when you suggest that the audio method wont be easy, MrDon. When I asked sensors expert Randy Frank, he said this: "It's not just a matter of the sound transducer. There's a lot that has to be done in the sound horn to make the sound come out accurately. It's a matter of mechanical shaping -- you need something that reflects the sound consistently." I would add that the understanding of that sound needs to be incorporated into the software algorithms, as well.
Redding, I agree. A digital camera would be the best approach in locating one's mouth instead of sound. There's a lot of sophisticated face recognition and gesturing software on the market that can initially fine tune the Popinator's location detection function instead of using a microphone or binaural sensors.
TommyH, I would guess part of the difficulty would involve the variances in voices. Some of our voices are loud, some are quiet. So, detecting distance would require an evaluation of an individual's voice to determine whether the person is close or far when that person says, "Pop."
I've been paying very close attention recently to a sensor technology which as far as anyone can tell does NOT use a binocular "triangulation" method to yield EXTREMELY precise distance measurements -- instead it uses an infrared illuminator, reflecting back thru a pseudo-random hole pattern in a shadow mask over a CCD imager chip, where the breakthru intellectual properties are the algorithms which allow deconvolving "empty field of view" speckle patterns with "object in field of view" speckle patterns -- and where the ratios of average hole separation, flying height of the mask over the CCD imager surface, and pixel separation ON the imager surface all play into yielding that incredible precision.
And what I find myself WONDERING is if there is any crossover "play" possible, with some type of capacitive-membrane (or piezoelectric??) acoustic-sensor "surface", with an "aural mask" perhaps formed of acoustic foam with randomly spaced holes in it, with some comparable/approximately-optimum ratio for hole- and sensor-spacing, and flying-height for the foam. Leaving the question of whether a verbalization would qualify in place of using some type of bat-like acoustic radar "chirp" to yield the same type of highly-informative acoustically-sourced position data -- the "chirp" may be absolutely required, in which case I would pursue THIS approach for sonar systems and skip the popcorn...
But if an ACOUSTIC "shadow mask" works then binaural microphones might be unnecessary and redundant -- and I would expect precision/resolution on the order of one or two wavelengths of the dominant/median audio frequency involved. Definitively worth investigating. Although gaining access to the algorithms required for analysis is going to be ...tricky... you might be able to at least do a "please attempt to analyze this data" handoff, to determine if there MAY be a possible signal embedded in the, what, "acousticospatial" sensor data?
*** BUT I would suggest a hugely simpler (less research involved, almost COTS) solution: use a visual sensor with sufficient range to bridge the typical "launch" distance (Kinect comes to mind, others exist), and simply search for a "gaping mouth" on hearing a distinguishable single-burst sound. Voice-detect circuits for "call-progress-monitoring" on voice response systems (I have a LOT of experience there) are trivial -- simply detect a characteristic signature "silence-burst-silence" and look for the immediately-following "open mouth" target on an in-view "face"; if you don't find it, pass on the suspected event. If you DO find it, leverage the "frame of reference" and known windage, calculated projectile mass (MEMS sounded good, although you could probably do an "air jet suspension" mass-calc with calibrated flapper, or a solenoid-bounced spring-constant deflection test, etc; you may need to do "compressed popcorn pellets"; and I'd be torn between a compressed-air vs spring-deflection (solenoid) vs "elastic" slingshot "shooter"), etc and "Fire at Will!" (or Grace, or George, or whomever)... You WILL want to "advise caution" for anyone tempted to "load the hopper" with peanuts, gummy bears or (gods forbid) jaw breakers ("you could put an EYE out with one of those, sonny!")...
Feedback would be appreciated. PARTICULARLY if you take a look into the acoustico-spatial sensor approach (I'll split the royalties if it turns out to be as feasible as "gut feel" says it should be), but if you consider attempting the "search for a gaping mouth" visual sensor approach, too. Note: there are optimizations lurking -- things like "constantly look for a NEW open mouth, while you WAIT for an 'impulse' audio trigger" since precalculated targeting yields an IMMEDIATE "launch" response, no calc-gap. If there is any visible projector "swivel" it would probably be pretty cool to watch it targeting yawns. Even if the audible trigger IS a dog barking, or a hand-clap or a finger-snap, it would STILL qualify as a 'Score!!' as far as most people would be concerned...
So if I give one of these to my daughter will it have a melt-down when I go for a visit? She has 5 kids... An average scenario has me entering the living room to a barrage of 'Hi Pop-Pop', 'Pop-Pop's here', and just plain ole 'Pop-Pop! Pop-Pop! (from the 2 & 4 yr olds)'.
And on that thought about the dog barking... just watch how fast dogs evolve the ability to say whatever words deliver food from our robots. Special algorithms will need to be developed to do 'hacker doggy' detection.
I am also wondering if sound reflection inside a room will be much of a problem for a word based tech such as this one. Echoes could muddy the sound such that attempting to detect multiple instances of a word and then using those detections to do location, aim, fire...
Plus as is the case with many words, Pop can be pronounced with nuance by different people... Pop (muted second 'p') and also PoP (lip bounce on the second 'P').
The Clapper merely listens for sharp spikes I guess... which is why it is not always recommended for 'loud' households or with dogs, but is also why it works in general. 2 loud claps (or 3)... ta da.
And you are right Larry... Much of the hard work on aiming has been done already on much tougher targets.
Design engineers need to prepare for a future in which their electronic products will use not just one or two, but possibly many user interfaces that involve touch, vision, gestures, and even eye movements.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies.
You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived.
So if you can't attend live, attend at your convenience.