The technology behind our autonomous vehicle project is largely inspired by the recent breakthroughs in the machine learning. Specifically, we are using Convolutional Neural Networks, supervised machine learning. You might be wondering, how are we coding those rules for the autonomous car steer itself. But the reality is, we are not telling the car what do you, we just train it to accomplish the task.
In 2016, NVIDIA first introduced this concept of end-to-end learning for self-driving cars. If you are familiar with ML, NVIDIA proposed:
“We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. “
At CES in 2016, NVIDIA showcased this technology in a real self-driving car. Our program is largely based on their proposed idea. You can read about their approach here.
First of all, all the decisions that the vehicle makes are based on images captured by a camera mounted on the front of the vehicle.
A Quick Intro to Artifical Neural Networks
This will be a quick generalization of CNNs. The algorithm was made popular by Geoffrey Hinton, who many consider the godfather of A.I., in 2012. Hinton and his team proposed an architecture for convolutional neural networks to classify objects. Their work won the ImageNet challenge in 2012. Grant it, CNN existed way before AlexNet, however, only recently did we have the computational power to train large CNNs on GPUs.
What is a convolutional neural network? First, let’s talk about artificial neural networks, which usually have the following components:
- Input layer
- Middle layers (hidden neurons)
- Output layer
- Weights & biases,
- Loss function
Essentially, a neural network has a bunch of (millions of) tunable parameters, called weights, which tries to emulate human neural synapses. Upon construction, all of the weights in a neural network is essentially random (optimized randomness). These weights give the network the flexibility to learn and to adapt. Without changing these weights, this neural network is useless.
The training process is similar to teaching a human: show the network a lot of good examples and hopefully it will imitate it. In this case, we will show it images of human driving and corresponding steering angles (The human driving data is called training labels). On a more mathematical level, all neural networks have a loss function, which quantifies how wrong the network prediction is. Our goal is to minimize the loss function so the network prediction is infinitely close to the training labels. In order to do that, we deploy a method called backpropogation (backward propagation of errors) to introduce change to the current weights. In doing so, the next time the network sees a training image, the output is closer to the training label — closely to our desired output.
Training requires some input and some desired output. In my case, the input is an image of what’s in front of the cart, the output is the steering angle, which is sent to the Arduino motor controller. Our training dataset contains 90,000 driving images captured by Udacity on a 2016 Lincoln. The goal for our convolutional neural network is to optimize the weights so the neural network can output a steering angle that is very similar to what a human would have done.
Convolutional Neural Network
Convolutional Neural Networks are very similar to ordinary Neural Networks: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function. In the case of our self-driving car, from the raw image pixels on one end to a steering prediction at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.
The beauty of this system is that as a programmer, I do not need to explicitly tell the car to follow any rules. Instead, the computer figures all of this out during training.
I hope that was a helpful overview. If you want to learn more about neural networks, this guys explains it much better than I. In a future post, I will discuss the architecture and training process of our convolutional neural network. Feel free to leave a comment below if you have any questions. Thanks!