How an overlooked feature of deep learning networks can turn into a major breakthrough for AI
This is a guest post. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
At an early age, as we take our first steps into the world of math and numbers, we learn that one apple plus another apple equals two apples. We learn to count real things. Only later are we introduced to a weird concept: zero… or the number of apples in an empty box.
This article appeared 1st in IEEE Spectrum, 18 Dec 2019
The concept of “zero” revolutionized math after Hindu-Arabic scholars and then the Italian mathematician Fibonacci introduced it into our modern numbering system. While today we comfortably use zero in all our mathematical operations, the concept of “nothing” has yet to enter the realm of artificial intelligence.
In a sense, AI and deep learning still need to learn how to recognize and reason with nothing.
Is it an apple or a banana? Neither!
In a typical task, a DNN might be trained to visually recognize a certain number of classes, say pictures of apples and bananas. Deep learning algorithms, when fed a good quantity and quality of data, are really good at coming up with precise, low error, confident classifications.
The problem arises when a third, unknown object appears in front of the DNN. If an unknown object that was not present in the training set is introduced, such as an orange, then the network will be forced to “guess” and classify the orange as the closest class that captures the unknown object—an apple!
Basically, the world for a DNN trained on apples and bananas is completely made of apples and bananas. It can’t conceive the whole fruit basket.
Enter the world of nothing
While its usefulness is not immediately clear in all applications, the idea of “nothing” or a “class zero” is extremely useful in several ways when training and deploying a DNN.
During the training process, if a DNN has the ability to classify items as “apple,” “banana,” or “nothing,” the algorithm’s developers can determine if it hasn’t effectively learned to recognize a particular class. That said, if pictures of fruit continue to yield “nothing” responses, perhaps the developers need to add another “class” of fruit to identify, such as oranges.
Meanwhile, in a deployment scenario, a DNN trained to recognize healthy apples and bananas can answer “nothing” if there is a deviation from the prototypical fruit it has learned to recognize. In this sense, the DNN may act as an anomaly detection network—aside from classifying apples and bananas, it can also, without further changes, signal when it sees something that deviates from the norm.
As of today, there are no easy ways to train a standard DNN so that it can provide the functionality above.
One new approach called a lifelong DNN naturally incorporates the concept of “nothing” in its architecture. A lifelong DNN does this by cleverly utilizing feedback mechanisms to determine whether an input is a close match or instead a mismatch with what it has learned in the past.
This mechanism resembles how humans learn: we subconsciously and continuously check if our predictions match our world. For example, if somebody plays a trick on you and changes the height of your office chair, you’ll immediately notice it. That’s because you have a “model” of the height of your office chair that you’ve learned over time—if that model is disconfirmed, you realize the anomaly right away. We humans continuously check that our classifications match reality. If they don’t, our brains notice and emit an alert. For us, there are not only apples and bananas; there’s also the ability to reason that “I thought it was an apple, but it isn’t.”
A lifelong DNN captures this mechanism in its functioning, so it can output “nothing” when the model it has learned is disconfirmed.
Nothing to work with, no problem
Armed with a basic understanding of “nothing” using the example of apples and bananas, let’s now consider how this would play out in real-world applications beyond fruit identification.
Consider the manufacturing sector, where machines are tasked with producing massive volumes of products. Training a traditional computer-vision system to recognize different abnormalities in a product—say, surface scratches—is very challenging. On a well-run manufacturing line there aren’t many examples of what “bad” products look like, and “bad” can take an endless number of forms. There simply isn’t an abundance of data about bad products that can be used to train the system.
But with a lifelong DNN, a developer could train the computer-vision system to recognize what different examples of “good” products look like. Then, when the system detects a product that doesn’t match its definition of good, it can categorize that item as an anomaly to be handled appropriately.
For manufacturers, lifelong DNNs and the ability to detect anomalies can save time and improve efficiency in the production line. There may be similar benefits for countless other industries that are increasingly relying on AI.
Who knew that “nothing” could be so important?