With an eye toward safety and ergonomics in factory automation, finding simpler and cost-effective ways to control factory floor equipment is of utmost concern. Touchless control via gesture is one way to implement safer and more intuitive human-machine interfaces. In addition to machine-man interfaces, these 3D systems can be used to guide and control robotic systems in new ways. This article introduces the basics of different 3D imaging methodologies: time of flight (ToF), structured light, and stereo vision. Then we will consider how the time-of-flight can enhance the various automation applications, both from a functional aspect, and a safety point of view.
3D imaging methodologies
Stereo vision operates on the same principal as human vision. Two precisely aligned standard 2D cameras (eyes) capture the scene. The two simultaneously captured images are processed by the system controller (brain) to analyze the images to determine the depth of each object in the field of view. The benefit of such a system is that it utilizes standard off-the-shelf cameras, which allow for the highest resolution needed for the application. Also, the cost of these cameras is well understood and typically can be fairly low. The drawbacks of stereo vision are:
- the physical alignment is critical and, therefore, can be potentially costly
- the image processing algorithms are very complex and require more powerful processors to accomplish this in real-time. This precise alignment can be a real challenge to maintain in a factory environment.
Structured light systems are what most of us are familiar with when we think of 3D imaging. This system is currently utilized by Kinect from Microsoft. The basic principle is to illuminate the scene with a series of disruptive light patterns. A standard 2D camera is used to acquire the scene. The processor interpolates the sequence of reflected images to generate the depth of the objects in the scene. The benefits of this system is that by utilizing higher resolution cameras, as well as more elaborate light patterns, a very precise depth map can be generated. The drawbacks are the inherent latency in having to analyze multiple image frames to determine the depth. Additionally, the camera system's native resolution is reduced as the image processing requires interpolation between pixels. Due to the need for the patterns of light to travel a minimum distance in order for the image processing system to accurately interpret the returned disruptive light patterns, it does not scale very well to close interaction applications. The final drawback is similar to stereo vision in that precise alignment between the illumination source and the camera system is required in order to properly analyze the light patterns.
Time-of-flight (ToF) systems work based on a universal constant: the speed of light. The scene is illuminated by a pulsed infrared illumination. The reflected energy is received by a special depth sensor. The sensor has a special pixel architecture that allows it to understand the phase of the received energy at each pixel location. This received phase is compared to the known emitted phase, and simple math converts phase to time. Since the energy is