There are three main ingredients needed for intelligent robots to be ubiquitous, smart, and useful. I like to call these three ingredients Mind, Brain, and Body. Let’s look at how these three enabling technology have evolved, and why the time is now for the emergent of intelligent machines.
Body is the easy one: robots need to be cheaply produced. This is happening thanks to innovations in rapid prototyping, 3D printing, and plummeting cost of sensors due to massive adoption of mobile devices, which contain much of the sensors robots need.
Mind is what Neurala does: we design the software to make these robots learn and behave (perceive, navigate) intelligently. We pioneered Deep Learning algorithms (before they were Deep… they were called ‘Neural’ networks!), and make them work on robots and drones.
However, Body and Mind alone are hopeless without what I call the Brain. By Brain I refer not to algorithms, or intelligent per se, but to computing substrate that can run a Mind in real-time on a Body.
Not all computing substrates, of course, fit the bill. We need low-power, yet powerful (in terms of computing density, this can be summarized as GFLOP/Watt) processors that can be fit on a mobile platform. See our article on IEEE Spectrum on this topic.
In a sense, this is nothing different than a cooking recipe for a good pizza Margherita! All you need is thee ingredients need to be present simultaneously, and be of good quality, to enable unlocking of robot potential: inexpensive Body, intelligent Mind, and powerful (yet not power-hungry!) Brain.
What has really changed in the past few years, and is really the underlying success of many of the progress done by Artificial Intelligence in the past few years, including the recent success (almost viral) of Deep Networks, is that hardware now exist that can support a Mind. Let’s look at this hardware more in details.
Minds, in particular Deep (Neural) Networks, are algorithms whose execution and training methods are highly conducive to parallelism. Graphic processing units (GPUs), originally designed for computer gaming and 3D graphic processing, have been used in general-purpose, computationally intensive models in application domains ranging from physical modeling, image processing, and computational intelligence since early 2000’s. More recently, fast-moving startups such as Movidius have introduced a different framework, called visual processing unit (VPU- a term also occasionally associated with GPUs), which provides another quantum leap into the battle between the two opposing “gods”: GFLOP and Watt. More on VPU later: let’s start with GPU.
The term GPGPU is often used to refer to the backbone of the revolution: General-Purpose computation on Graphics Processing Units. GPUs, chips whose main technological push come from huge revenues from the gaming market, and more recently are finding their ways into mobile devices, are in reality high-performance many-core processors that can be used to accelerate a wide range of applications, going from physics, to chemistry, to computer vision, to neuroscience. And, of course, Deep Networks for robotics and other applications.
GPU-accelerated computing leverages GPUs to reduce the time in computing complex applications: it offloads compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. An intuitive way to understand why this is faster is to look into how CPU and GPU compute: while CPU are designed to have a few cores optimized for sequential serial processing, GPUs have been built by design as massively parallel processors consisting of thousands of smaller cores which are best at handling multiple tasks simultaneously.
Neurala itself started in 2006 as a container for a patent to use GPUs are general purpose Neural simulator: we realized very quickly, back in early 2000’s, that GPU were going to change the game of AI and make it real not only for big companies, but for everyday uses and consumers.
It was a cold and rainy winter in 2004, when Anatoli Gorchetchnikov, Heather Ames and myself got really interested in general purpose computing on graphic processing cards. At the time, there was no CUDA or OpenGL available: programming GPUs was really tough. I call that “BC” (Before Cuda) times…But we tried, with some very good results, to port some of the models we used on GPUs. It was so good that we wrote a patent around that work, and we created a company, Neurala, to contain it. Years later, the company has raised money, entered TechStars, and launching products, but this is yet another story!
First of all, a disclaimer: the GeForceR 6800 used at the time is today a museum piece…nevertheless, many of the arguments that follow are still relevant.
As you may have heard, Deep Networks or Artificial Neural Networks (ANNs) are highly parallel, computationally intensive mathematical models that capture the dynamics and adaptability of biological neurons. Neurons are cells that make up most of the nervous system. An adult human brain weights approximately 1.3 Kg and includes 100 billion neurons. The neuron’s axon then communicates signals to other neurons (approximately, each neuron is connected to other 10,000 neurons).
The artificial counterparts of biological neurons are dynamical systems requiring numerical integration to calculate their variables at every moment of time as a function of previous values of one or more of these variables. In certain classes of ANNs, neurons are described in terms of complex system of differential equations, and a significant degree of recurrency can characterize their connectivity matrix. Graphic processing units (GPUs), originally designed for computer gaming and 3D graphic processing, have been used in general-purpose, computationally intensive models in application domains ranging from physical modeling, image processing, and computational intelligence. Biologically realistic neural networks are used to model brain areas responsible for perceptual, cognitive, and memory functions. You can read more details in this post.
Of course, the roots of the theory and GPU computation go really deep in the past. How Deep in the past? And how Deep in the past go the theories that are the backbones of the recent surge in popularity of these algorithms? Jürgen Schmidhuber (Pronounce: You_again Shmidhoobuh), a professor at IDSIA in Lugano, Switzerland, provides a great historical survey which summarizes how the Deep Network field origins go back to the previous millennium. He is right: Deep Networks are the results of the combination, permutation, mutation, and refinements of algorithms that have been researched and fielded in the past few decades.
However, the introduction of constantly smaller/more power efficient GPUs is changing the range of applicability of these technologies. Recently, the main player in GPU, nVidia, unveiled the release of the NVIDIA Jetson TX1 Developer Kit, a mobile GPU which, in my opinion, will be a game-changer in enabling more portable and ubiquitous artificial intelligence applications, from mobile robots to drones.
Smaller GPU, bigger Drones/Robotic markets. As simple as that.
Now, what about the VPU (visual processing unit?).
VPU differs from GPUs as they were designed from the ground-up to allow mobile devices to process intensive computer vision algorithms (whereas GPUs were adapted from processing graphics to machine vision). For instance, Movidius has developed and ad-hoc chip architecture to greatly optimize the GFLOP vs. Watt “fight”. Their Myriad 2 is the industry’s first always-on vision processor. In a sense, GPUs were the beginning, but they may not be the end of where “brain” computing is going, and companies like Movidius will surely push the boundaries of what can be done by introducing new computing frameworks.
It’s a good time to be around, with several forces pushing simultaneously to solve the Mind, Brain, and Body problem, every resistance is futile.