When DNNs become more human: DNNs vs Transfer Learning vs. Continual Learning

As new approaches to deep neural networks (DNNs) and deep learning continue to emerge, it’s important to understand the differences between them. Not all DNNs are created equal, and small tweaks in their architecture sometimes have profound implications in their applicability to real-world scenarios.

(original article on Inside BIGDATA)

For practical purposes, the choice of different ‘shades’ of Neural Networks – from traditional DNNs, to Transfer Learning, to new more brain-like approaches like Continual Learning – can make the difference between a failed experiment and a realized deployment, especially in real-world scenarios closer to the ones where humans operate.

How Today’s DNN Work

Neural network algorithms are the backbone of AI. They derive their power from the ability to learn from data, as opposed to being preprogrammed to perform any given function. Neural networks also use a learning formalism called Backpropagation. Introduced in the late 70s and widely re-discovered and adopted in the recent years, Backpropagation is able to match and sometimes even surpass human-level performance in an ever-increasing list of tasks, from playing chess to detecting an intruder in a security camera.

Unfortunately, this super-performance comes at some heavy price. Backpropagation networks are very sensitive to new information and are susceptible to catastrophic interference. When something new is learned, it wipes out old information. To mitigate this problem, researchers made the training process a lot slower and froze learning after the target performance was reached, to avoid compromising older information learned when new information is added. And, to retrain a DNN, one needs to have all data stored to add new one.

So, today’s DNN are slow to train, static after training (because updating them is glacially slow) and sometimes impractical in real applications.

How about Transfer Learning?

Today, Transfer Learning is a pretty popular approach where a previously developed DNN is ‘recycled’ as the starting point from which DNN learns a second task. Essentially, with Transfer Learning nothing really changes with respect to the traditional DNN methodology, except that you can train on a bit less data.

For example, take a DNN model that was trained on Imagenet which can recognize 1000 object classes (cars, dogs, cats, etc.). If you want to train a new model to identify the difference between a specific breed of dogs, Transfer Learning would try to learn these breeds by initializing a new network with the weights from the bigger 1000 class network which already knows a bit about what dogs look like. And while it may make the training of the new network faster and more reliable, there is a catch: the newer network can now only recognize these two specific breeds of dogs. Instead of recognizing the 1,000 objects it previously learned, it will now only be able to identify only two.

So, even though this training would be faster than your training sessions initialized from scratch, it could still take anywhere from hours to days, depending on the size of the dataset, and it will know much less.

Then, How about Continual Learning?

There is another category of DNNs that are gaining traction, belonging to the camp called Continual (or Lifelong) learning. An implementation of which, called Lifelong-DNN (L-DNN), inspired by brain neurophysiology, is able to add new information on the fly. Unlike DNNs and Transfer Learning, it uses a completely different methodology where iterative processes typical of Backpropagation are mathematically approximated by instantaneous ones, in an architecture that introduces new processes, layers, and dynamics with respect to traditional DNNs.

When it comes to training, you only train once on every piece of data you encounter. This translates into massive gains in training speed, where, on the same hardware, L-DNN can train between 10K to 50K faster than a traditional DNN.

So Which Way is the Best Way?

In terms of training data algorithms, there are several approaches to choose from, and no, they are not all the same. If you have infinite compute power, data, and time, then using a DNN makes sense. If you do not, L-DNN can be the only way to cope with certain use cases where data is scarce, computation must be fast and local, and the model needs frequent updates.

As AI continues to be integrated into every facet of our lives, in tough real-world scenarios, we’ll see novel approaches to neural networks such as L-DNN emerge that will more closely approximate the continual learning and flexibility of humans. After all, AI is built in our image!