OK, I'll bite (pun intended). So, you want to launch a projectile, that has a bad habit of sticking in one's throat under normal conditions, at high velocity, into one's mouth, not an eye where the cornea could get scratched, while breathing in after issuing a command that opens the trachea exposing one's lungs to this projectile, and make it available to the youth of the household and your pets?
Maybe the better way is a video processing system with acoustic distance-determining subroutines?
Next, how about one that launches pins for those who sew a lot? Or nails for a carpenter who is putting up framing? Or syringes for nurses giving injections. The possiblities are endless!
One word: kinect
And then some more words(I am an engineer after all)
I think the kinetic sensor has what you need. It has a windows sdk now too. It will locate separate targets, judge distance, and even identify the mouth location. The trick will be figuring out how far above or below the shooter is by a calibration that would probably need screen and a interface.
I believe you're right when you suggest that the audio method wont be easy, MrDon. When I asked sensors expert Randy Frank, he said this: "It's not just a matter of the sound transducer. There's a lot that has to be done in the sound horn to make the sound come out accurately. It's a matter of mechanical shaping -- you need something that reflects the sound consistently." I would add that the understanding of that sound needs to be incorporated into the software algorithms, as well.
Redding, I agree. A digital camera would be the best approach in locating one's mouth instead of sound. There's a lot of sophisticated face recognition and gesturing software on the market that can initially fine tune the Popinator's location detection function instead of using a microphone or binaural sensors.
TommyH, I would guess part of the difficulty would involve the variances in voices. Some of our voices are loud, some are quiet. So, detecting distance would require an evaluation of an individual's voice to determine whether the person is close or far when that person says, "Pop."
I've been paying very close attention recently to a sensor technology which as far as anyone can tell does NOT use a binocular "triangulation" method to yield EXTREMELY precise distance measurements -- instead it uses an infrared illuminator, reflecting back thru a pseudo-random hole pattern in a shadow mask over a CCD imager chip, where the breakthru intellectual properties are the algorithms which allow deconvolving "empty field of view" speckle patterns with "object in field of view" speckle patterns -- and where the ratios of average hole separation, flying height of the mask over the CCD imager surface, and pixel separation ON the imager surface all play into yielding that incredible precision.
And what I find myself WONDERING is if there is any crossover "play" possible, with some type of capacitive-membrane (or piezoelectric??) acoustic-sensor "surface", with an "aural mask" perhaps formed of acoustic foam with randomly spaced holes in it, with some comparable/approximately-optimum ratio for hole- and sensor-spacing, and flying-height for the foam. Leaving the question of whether a verbalization would qualify in place of using some type of bat-like acoustic radar "chirp" to yield the same type of highly-informative acoustically-sourced position data -- the "chirp" may be absolutely required, in which case I would pursue THIS approach for sonar systems and skip the popcorn...
But if an ACOUSTIC "shadow mask" works then binaural microphones might be unnecessary and redundant -- and I would expect precision/resolution on the order of one or two wavelengths of the dominant/median audio frequency involved. Definitively worth investigating. Although gaining access to the algorithms required for analysis is going to be ...tricky... you might be able to at least do a "please attempt to analyze this data" handoff, to determine if there MAY be a possible signal embedded in the, what, "acousticospatial" sensor data?
*** BUT I would suggest a hugely simpler (less research involved, almost COTS) solution: use a visual sensor with sufficient range to bridge the typical "launch" distance (Kinect comes to mind, others exist), and simply search for a "gaping mouth" on hearing a distinguishable single-burst sound. Voice-detect circuits for "call-progress-monitoring" on voice response systems (I have a LOT of experience there) are trivial -- simply detect a characteristic signature "silence-burst-silence" and look for the immediately-following "open mouth" target on an in-view "face"; if you don't find it, pass on the suspected event. If you DO find it, leverage the "frame of reference" and known windage, calculated projectile mass (MEMS sounded good, although you could probably do an "air jet suspension" mass-calc with calibrated flapper, or a solenoid-bounced spring-constant deflection test, etc; you may need to do "compressed popcorn pellets"; and I'd be torn between a compressed-air vs spring-deflection (solenoid) vs "elastic" slingshot "shooter"), etc and "Fire at Will!" (or Grace, or George, or whomever)... You WILL want to "advise caution" for anyone tempted to "load the hopper" with peanuts, gummy bears or (gods forbid) jaw breakers ("you could put an EYE out with one of those, sonny!")...
Feedback would be appreciated. PARTICULARLY if you take a look into the acoustico-spatial sensor approach (I'll split the royalties if it turns out to be as feasible as "gut feel" says it should be), but if you consider attempting the "search for a gaping mouth" visual sensor approach, too. Note: there are optimizations lurking -- things like "constantly look for a NEW open mouth, while you WAIT for an 'impulse' audio trigger" since precalculated targeting yields an IMMEDIATE "launch" response, no calc-gap. If there is any visible projector "swivel" it would probably be pretty cool to watch it targeting yawns. Even if the audible trigger IS a dog barking, or a hand-clap or a finger-snap, it would STILL qualify as a 'Score!!' as far as most people would be concerned...
So if I give one of these to my daughter will it have a melt-down when I go for a visit? She has 5 kids... An average scenario has me entering the living room to a barrage of 'Hi Pop-Pop', 'Pop-Pop's here', and just plain ole 'Pop-Pop! Pop-Pop! (from the 2 & 4 yr olds)'.
And on that thought about the dog barking... just watch how fast dogs evolve the ability to say whatever words deliver food from our robots. Special algorithms will need to be developed to do 'hacker doggy' detection.
I am also wondering if sound reflection inside a room will be much of a problem for a word based tech such as this one. Echoes could muddy the sound such that attempting to detect multiple instances of a word and then using those detections to do location, aim, fire...
Plus as is the case with many words, Pop can be pronounced with nuance by different people... Pop (muted second 'p') and also PoP (lip bounce on the second 'P').
The Clapper merely listens for sharp spikes I guess... which is why it is not always recommended for 'loud' households or with dogs, but is also why it works in general. 2 loud claps (or 3)... ta da.
And you are right Larry... Much of the hard work on aiming has been done already on much tougher targets.
Design engineers need to prepare for a future in which their electronic products will use not just one or two, but possibly many user interfaces that involve touch, vision, gestures, and even eye movements.
Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies.
You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived.
So if you can't attend live, attend at your convenience.