GPUs or CPUs for Deep Learning?

GPUs or CPUs for Deep Learning?

Photo by Jordan Harrison from Pexels

The repurposing and expansion of GPUs for neural network calculations has revolutionized the possibilities of deep neural network architectures and made large, general-purpose models like Inception and RestNet computationally feasible. Why is this? What makes GPUs so much better suited than CPUs for these calculations?

When explaining this in person, I find it most helpful to consider the similarities to 3D rendering. As some may already know, any 3D model is composed of tessellations (or tiny triangles) that are all rendered using the appropriate material, lightning, position, and its other attributes. In other words, the same exact type of calculation must be performed on each tessellation. Likewise, the same differentiated loss function has to be back-propagated through a neural network — against every neuron. In this similar manner, therefore, a single (grouped) operation has to be performed on a large set of neurons just as it must on the tessellations.

Although this may seem to make sense thus far, what attributes of a GPU make it better at doing the same operation across many data points? The answer lies in the hardware tradeoffs that a GPU can make but that a CPU cannot. While modern CPUs have much hardware devoted towards the classic stage of fetching and decoding operations, GPUs have a fraction of that hardware that focus on understanding the actually computation being performed. Since the GPU’s primary difference is performing the same operation on different data points, many more cores and more specifically ALUs (arithmetic logical units) can be packed into the same chip. This is why GPUs often have orders of magnitude more cores than CPUs. On one hand, CPUs can optimized general operation performance by using tricks such as out-of-order execution or speculative execution, GPUs need not have this hardware at all.

To put it another way — GPUs and CPUs have distinct purposes. Performing a graphics task on a CPU or doing general purpose calculations on a GPU will result in extremely poor performance. The trick is knowing when to use which one.