Engineering Deep Learning Hardware at the University Level

It's not just big tech companies developing processors for artificial intelligence. University researchers are also actively investigating how to build hardware for sophisticated AI applications like deep learning.

In addition to conducting its own research into AI-specific processors, MIT is introducing coursework to train the next generation of engineers in building hardware for deep learning and AI applications.

Traditional hardware architecture for deep learning (DL) has consisted of CPUs, Graphics Processing Units (GPUs),Field Programmable Gate Arrays(FPGAs), and ASICs. Recent improvements in FPGAs' digital processing efficiency have been used to increase computation throughput of deep learning architecture based computers and hardware systems. But Vivienne Sze, an Associate Electrical Engineering Professor at MIT, says that, although these computational hardware units have been used to develop deep learning-based algorithms and applications, their efficiency in performing such n-dimensional analysis has been subpar.

Vivienne Sze and Joe Emer discussing hardware architecture design for a MIT Deep Learning Class.
(image source: Little Pauqette / MIT School of Engineering

Performing powerful artificial intelligence (AI) tasks requires energy-efficient chips. And it also begs a fundamental question that Sze, along with Joel Emer, a Senior Research Scientist at Nvidia and Electrical Engineering Professor at MIT, and their team of MIT researchers, have concerned themselves with for years: How can you write algorithms that map well into hardware so they can run faster?

In 2016 their research culminated with the development of a new chip they call Eyeriss that the research team says is optimized for neural network computing. According to a research paper on the Eyeriss' development published in the IEEE Journal of Solid-State Circuits, the chip is powerful and energy efficient enough to allow sophisticated artificial intelligence applications to even run internally on mobile devices.

The Eyeriss chip is an accelerated, state-of-the-art deep learning convolutional neural betwork (CNN), a class of deep learning neural network applied to analyzing visual imagery. The chip is optimized for a complete DL system consisting of an off-chip DRAM for various CNN-based architectures. AI systems typically use CNN for improved data throughput and energy efficiency of the target hardware host. Large datasets require significant computational energy to process and move data from on-chip to off-chip devices. Such processing and data movement functions consumes energy on traditional CPU-based chips.

According to the MIT researchers, Eyeriss uses a dataflow technique call Row Stationary (RS) processing to achieve the low power energy consumption and high throughput from the chip. The spatial architecture with 168 processing elements and RS reconfigures the computation mapping, thereby producing optimized energy efficiency aided by reusing local data to reduce DRAM access through reduced data movement inside the computer chip.

Deep learning is an expanded subgroup of a branch of computer science called machine learning. DL specifically draws inspiration from the function and structure of the organic brain to create groups of algorithms called artificial neural networks. The idea is to develop learning algorithms that are both efficient and easy to use.

A diagram of the Eyeriss chip. The chip, designed by researchers at MIT, has the power and energy effiency to potentially enable mobile devices to run deep learning applications.
(image source: IEEE Journal of Solid State Circuits)

DL uses learning data representations instead of task-specific algorithms. Learning data representations in a machine learning application allow a system to discover a similar object required for detecting qualities/features and arrangements/classifications of raw data automatically. The two methods of learning are supervised (using labeled input data such as neural networks) or unsupervised (using unlabeled input data such as clustering). Biological nervous systems aided by information processing and communication patterns are primarily used in building DL models. A more practical application of DL is text, document, and image data arrangement or classification used for discovering or mining information on websites.

Development of chip architectures targeted directly at AI processing has ballooned in recent years, driven by the demand for AI applications in everything from manufacturing, to healthcare, to entertainment, and even retail. Companies like Nvidia, Microsoft, and Google have entered into a hotly-contested battle over who can provide the best hardware for running deep learning and other AI algorithms. For Nvidia the answer lies in high-powered GPUs. Microsoft, as mentioned, is looking to leverage FPGAs, while Google is working with a whole new processor of its own development called a Tensor Processing Unit (TPU). Earlier this year, chipmaking giant Intel got into the game by announcing it had developed a prototype of a “neuromorphic” chip it calls Loihee, that will allow devices to perform advanced deep learning processing on the edge.

More and more institutions are exploring deep learning hardware at the university level as well. In 2017 Sze and Emer began teaching a course at MIT, “Hardware Architecture for Deep Learning.” Regarding the goals of the course, Sze told MIT News, “The goal of the class is to teach students the interplay between two traditionally separate disciplines...How can you write algorithms that map well onto hardware so they can run faster? And how can you design hardware to better support the algorithm? It’s one thing to design algorithms, but to deploy them in the real world you have to consider speed and energy consumption.”
Sze and Emer have also written a tutorial/journal article on building hardware for DL that provides an in-depth discussion and analysis to creating such computation devices.

Don is a passionate teacher of electronics technology and an electrical engineer with 26 years of industrial experience. He has worked on industrial robotics systems, automotive electronic modules and systems, and embedded wireless controls for small consumer appliances. He's currently developing 21st century educational products focusing on the Internet of Things for makers, engineers, technicians, and educators. He is also a Certified Electronics Technician with ETA International and a book author.

One of the major hassles of Deep Learning is the need to fully retrain the network on the server every time new data becomes available in order to preserve the previous knowledge. This is called "catastrophic forgetting," and it severely impairs the ability to develop a truly autonomous AI (artificial intelligence). This problem is solved by simply training on the fly — learning new objects without having to retrain on the old. Join Neurala’s Anatoly Gorshechnikov at ESC Boston, Wednesday, April 18, at 1 pm, where he will discuss how state-of-the-art accuracy, as well as real-time performance suitable for deployment of AI directly on the edge, moves AI out of the server room and into the hands of consumers, allowing for technology that mimics the human brain.

Comments (0)

Please log in or register to post comments.
  • Oldest First
  • Newest First
Loading Comments...