With an eye toward safety and ergonomics in factory automation, finding simpler and cost-effective ways to control factory floor equipment is of utmost concern. Touchless control via gesture is one way to implement safer and more intuitive human-machine interfaces. In addition to machine-man interfaces, these 3D systems can be used to guide and control robotic systems in new ways. This article introduces the basics of different 3D imaging methodologies: time of flight (ToF), structured light, and stereo vision. Then we will consider how the time-of-flight can enhance the various automation applications, both from a functional aspect, and a safety point of view.
3D imaging methodologies
Stereo vision operates on the same principal as human vision. Two precisely aligned standard 2D cameras (eyes) capture the scene. The two simultaneously captured images are processed by the system controller (brain) to analyze the images to determine the depth of each object in the field of view. The benefit of such a system is that it utilizes standard off-the-shelf cameras, which allow for the highest resolution needed for the application. Also, the cost of these cameras is well understood and typically can be fairly low. The drawbacks of stereo vision are:
- the physical alignment is critical and, therefore, can be potentially costly
- the image processing algorithms are very complex and require more powerful processors to accomplish this in real-time. This precise alignment can be a real challenge to maintain in a factory environment.
A factory floor control room that can be managed through gesture recognition.
(Source: Texas Instruments)
Structured light systems are what most of us are familiar with when we think of 3D imaging. This system is currently utilized by Kinect from Microsoft. The basic principle is to illuminate the scene with a series of disruptive light patterns. A standard 2D camera is used to acquire the scene. The processor interpolates the sequence of reflected images to generate the depth of the objects in the scene. The benefits of this system is that by utilizing higher resolution cameras, as well as more elaborate light patterns, a very precise depth map can be generated. The drawbacks are the inherent latency in having to analyze multiple image frames to determine the depth. Additionally, the camera system’s native resolution is reduced as the image processing requires interpolation between pixels. Due to the need for the patterns of light to travel a minimum distance in order for the image processing system to accurately interpret the returned disruptive light patterns, it does not scale very well to close interaction applications. The final drawback is similar to stereo vision in that precise alignment between the illumination source and the camera system is required in order to properly analyze the light patterns.
Time-of-flight (ToF) systems work based on a universal constant: the speed of light. The scene is illuminated by a pulsed infrared illumination. The reflected energy is received by a special depth sensor. The sensor has a special pixel architecture that allows it to understand the phase of the received energy at each pixel location. This received phase is compared to the known emitted phase, and simple math converts phase to time. Since the energy is traveling at the speed of light, it is one simple additional calculation to convert this to distance.
The most significant benefit of time-of-flight over the alternatives is that post processing is not required after the camera to calculate depth. This benefits the end product in several ways. It eliminates any latency introduced by post-processing, lowers costs associated with requiring a more powerful post-processing system, and ensures the sensor’s native resolution is maintained through the depth map. Another benefit is that the mechanical alignment of the illumination source to the receptor is not critical. Since only the relative phase is critical, the mechanical configuration is more forgiving.
Time-of-flight is a very scalable solution. Much can be done just by changing the illumination power, optical field-of-view, and emitter pulse frequency. You can move from a close interaction system capable of tracking fingers to about 1 m, to a far interaction gaming system capable of tracking multiplayers as they combat their way through their virtual playground. Because the depth camera is specialized, it tends to cost more than the standard 2D cameras used in the alternatives. Typically, this increased cost is offset by the lower cost mechanical requirements, as well as the lower cost post-processing system requirements.