The central processing unit is a computational device used in desktop PCs and laptop computers. (Image source: Intel)
As machine learning applications continue to grow, computing processing requirements are increasing in tandem. The neural network is at the core of machine learning. That network is based on an artificial neuron, which mathematically is a nonlinear function consisting of the inputs of the weighted sums. The collection of these artificial neurons, formed into layers, creates a data storage of computational networks. As machine learning is being applied to data-intensive applications, such as Internet search, natural language processing (NLP), and image recognition, Moore’s Law is expediently plateauing. The central processing unit (CPU) that provided the computational process for today’s computing machines is unable to meet the demands of “n-multidimensional” datasets.
The central processing unit (CPU) or microprocessor is a complex electronic circuit consisting of millions of transistors. The electronic circuitry is the computational brain of a computer capable of executing instructions of a program performing basic computing tasks. These computing tasks consist of arithmetic logical, control, and input/output (I/O) operations. Machine learning applications, such as linear regression and convolutional neural networks (CNN), can be performed on a CPU. The serial and linear processing operations are limited by the CPU’s handling of n-multidimensional datasets.
Pictured are the electronic sub-circuits of a typical CPU or microprocessor architecture. (Image source: Heath Electronics)
The graphical processing unit (GPU) is the next computational evolutionary step in processing capability. Unlike the CPU or microprocessor, the GPU is designed to rapidly alter and manipulate memory for the acceleration of images. Video gaming systems, such as Sony’s Playstation and Nintendo, use GPUs to manage images displayed on a high definition multimedia interface (HDMI) monitor. To create 3D images, a GPU uses texture mapping, which requires high computer density and deep pipelines or hundreds of multicore stage resources of graphics-based units. GPUs can perform matrix operations simultaneously as compared to a CPU. Therefore, a GPU is a parallel processing device. Although the GPU’s computational capabilities are faster than those of a CPU because of the parallel processing feature, the GPU is unable to perform n-multidimensional computations.
Pictured is the NVIDIA K80 graphics processing unit. (Image courtesy: Nvidia)
The introduction of a tensor processing unit (TPU) occurred at Google’s Mountain View, California I/O conference in 2016. Google started development of the TPU in 2013. The TPU is a custom-based hardware solution for assisting in new machine learning research. It is an application-specific integrated circuit (ASIC) that has the potential to do more in n-multidimensional mathematics.
A n-multidimensional array, known as a Tensor, is at the core of machine learning algorithms. A tensor can be a single input (a scalar), a vector (multiple inputs), or a matrix of inputs. Google’s TensorFlow is a workflow that allows training, testing, and production deployment of machine learning applications. With the use of TensorFlow, TPUs can be managed to provide an order of magnitude of performance per watt for machine learning through better optimization techniques. These optimization techniques are based in calculus mathematics.
Google’s TPU is a hardware ASIC chip designed specifically for Tensor based ML applications. (Image source: Google)
TPUs have been deployed in Google’s datacenters since 2015. The TPU based datacenters accelerate the linear algebra provision supported by the TPU, having a 65,536 8-bit multiply-accumulate (MAC) matrix multiply unit that offers high computational throughput. Unlike the GPU and CPU, the power consumption is lower in a TPU because the machine learning application only connects to the device’s software algorithm when performing tensor operations.
For additional technical information on TPUs, artificial intelligence/machine learning instructor Siraj Raval has produced a YouTube video. Within the video, Raval provides TPU code developed in Colaboratory for benchmarking a CPU versus a TPU speed in computing a simple addition problem. Also, a technical paper published by Google on TPU’s in-datacenter performance analysis can be obtained from the arvix.org website.
Don Wilcher is a passionate teacher of electronics technology and an electrical engineer with 26 years of industrial experience. He has worked on industrial robotics systems, automotive electronic modules and systems, and embedded wireless controls for small consumer appliances.