The Titan V contains 21 billion transistors and delivers 110 teraflops of deep learning performance. Nvidia is targeting the Titan V specifically at developers who work in AI and deep learning. Company founder and CEO Jensen Huang said in a press statement that the Titan V is the most powerful GPU ever developed for the PC. “Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links. With Titan V, we are putting Volta into the hands of researchers and scientists all over the world.”
A World Made of Tensors
Perhaps no company is more invested in the concept of tensors than Google. In the last year the search giant has released an already-popular, open-source framework for deep learning development dubbed TensorFlow. As described by Google, “TensorFlow is an open-source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.”
|Google's tensor processing unit (TPU) runs all of the company's cloud-based deep learning apps and is at the heart of the AlphaGo AI. (Image source: Google)|
TensorFlow's library of machine learning applications, which includes facial recognition, computer vision, and, of course, search, among other applications, has proved so popular that in 2016 Intel committed to optimizing its processors to run TensorFlow. In 2017 Google also release a Lite version of TensorFlow for mobile and Android developers.
But Google isn't letting software be the end of its AI ambitions. In 2016 the company released the first generation of a new processor it calls the tensor processing unit (TPU). Google's TPU is an ASIC built specifically with machine learning in mind and is tailor made for running TensorFlow. The second-generation TPUs were announced in May of this year and, according to Google, are able to deliver up to 180 teraflops of performance.
In a study released in June 2017 as part of the 44th International Symposium on Computer Architecture (ISCA) in Toronto, Canada, Google compared its TPUs deployed in data centers to Intel Haswell CPUs and an Nvidia K80 GPUs deployed in the same data centers and found that the TPU performed, on average, 15 to 30 times faster than the GPUs and CPUs. The TPUs' TOPS per watt were also about 30 to 80 times higher. Google now says that TPUs are driving all of its online services such as Search, Street View, Google Photos, and Google Translate.
In a paper detailing its latest TPUs, Google engineers said the need for TPUs arose as far back as six years ago when Google found itself integrating deep learning into more and more of its products. “If we considered a scenario where people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using [at the time], we would have had to double the number of Google data centers!” the Google engineers wrote.
Google engineers said in designing the TPU they employed what they call “systolic design.” “The design is called systolic because the data flows through the chip in waves, reminiscent of the way that the heart pumps blood. The particular kind of systolic array in the [matrix multiplier unit] MXU is optimized for power and area efficiency in performing matrix multiplications, and is not well suited for general-purpose computation. It makes an engineering tradeoff: limiting registers, control and operational flexibility in exchange for efficiency and much higher operation density.”