Introduction

The technology behind our autonomous vehicle project is largely inspired by the recent breakthroughs in the machine learning. Specifically, we are using Convolutional Neural Networks, supervised machine learning. You might be wondering, how are we coding those rules for the autonomous car steer itself. But the reality is, we are not telling the car what do you, we just train it to accomplish the task.

In 2016, NVIDIA first introduced this concept of end-to-end learning for self-driving cars. If you are familiar with ML, NVIDIA proposed:

“We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. “

At CES in 2016, NVIDIA showcased this technology in a real self-driving car. Our program is largely based on their proposed idea. You can read about their approach here.

DSC01417.jpg

The Hardware

First of all, all the decisions that the vehicle makes are based on images captured by a camera mounted on the front of the vehicle.

AAEAAQAAAAAAAAJfAAAAJGE3MTBiZjNkLTMxNjgtNGFmNi1iZGIxLWFjNDQ1NjY4NTNiYw.jpg
This is a self-driving car made by Google. The camera mounting is similar to ours

The Software

A Quick Intro to Artifical Neural Networks

This will be a quick generalization of CNNs. The algorithm was made popular by Geoffrey Hinton, who many consider the godfather of A.I., in 2012. Hinton and his team proposed an architecture for convolutional neural networks to classify objects. Their work won the ImageNet challenge in 2012. Grant it, CNN existed way before AlexNet, however, only recently did we have the computational power to train large CNNs on GPUs.

What is a convolutional neural network? First, let’s talk about artificial neural networks, which usually have the following components:

  1. Input layer
  2. Middle layers (hidden neurons)
  3. Output layer
  4. Weights & biases,
  5. Loss function
tikz11.png
Artificial Neural Network

Essentially, a neural network has a bunch of (millions of) tunable parameters, called weights, which tries to emulate human neural synapses. Upon construction, all of the weights in a neural network is essentially random (optimized randomness). These weights give the network the flexibility to learn and to adapt. Without changing these weights, this neural network is useless.

The training process is similar to teaching a human: show the network a lot of good examples and hopefully it will imitate it. In this case, we will show it images of human driving and corresponding steering angles (The human driving data is called training labels). On a more mathematical level, all neural networks have a loss function, which quantifies how wrong the network prediction is. Our goal is to minimize the loss function so the network prediction is infinitely close to the training labels. In order to do that, we deploy a method called backpropogation (backward propagation of errors) to introduce change to the current weights. In doing so, the next time the network sees a training image, the output is closer to the training label — closely to our desired output.

 

backpropagation.png
Backpropogation — the technique that powers neural networks.

 

Training requires some input and some desired output. In my case, the input is an image of what’s in front of the cart, the output is the steering angle, which is sent to the Arduino motor controller. Our training dataset contains 90,000 driving images captured by Udacity on a 2016 Lincoln. The goal for our convolutional neural network is to optimize the weights so the neural network can output a steering angle that is very similar to what a human would have done.

Convolutional Neural Network

Convolutional Neural Networks are very similar to ordinary Neural Networks: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function. In the case of our self-driving car, from the raw image pixels on one end to a steering prediction at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.

 

Screen-Shot-2015-11-07-at-7.26.20-AM.png
The basic architecture for CNNs

 

The beauty of this system is that as a programmer, I do not need to explicitly tell the car to follow any rules. Instead, the computer figures all of this out during training.

I hope that was a helpful overview. If you want to learn more about neural networks, this guys explains it much better than I. In a future post, I will discuss the architecture and training process of our convolutional neural network. Feel free to leave a comment below if you have any questions. Thanks!

12 thoughts on “Predicting Steering Angles Using Deep Learning — Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s