In part 1 of this series, I discussed the algorithm behind the autonomous steering system, specifically, convolutional neural networks. In this post, I will dive deeper into the details of the networks that we are using, as well as the development process. 

If you just came across my blog, I am excited to share with you that I am building a self-driving golf cart. To learn more about the project, please visit the project page

The ConvNet Architecture

This is an example of what our ConvNet might look like

If you have seen or used ConvNets before, this architecture might look very similar. A typical ConNet consists of several convolutional layers, pooling layers, activation layers,  and maybe some dense layers. The architecture of any ConvNet often determines the performance of the network. Novel techniques, such as short connections and dense blocks, will markedly improve the performance. In this research/project, I did not use the techniques mentioned above. But there are other tricks and methods that helped with the performance. Below, I will talk about the architecture that experimented with. All of them have the same input layer, which is (160*320*3), and the same output layer, which is a scalar value.

The Comma AI Architecture

The first one is inspired by the Comma AI self-driving research project.

Screen Shot 2018-01-27 at 4.56.40 PM.png
I use Keras as my Deep Learning framework. This is an overview of the Keras implementation of the Comma A.I. model.

This network performed quite well on the validation dataset (validation related topics are below) even though the network is relatively shallow. The advantage is that training and inferencing time is very low. Michael suggested me to use ELU (leaky relu) for the activation functions, which performed better than traditional RELU.

NVIDIA Inspired network

In the previous post, I mentioned the NVIDIA self-driving research, specifically their end to end system. The second network we used is largely inspired by the CNN in their paper. 

Screen Shot 2018-01-27 at 5.04.12 PM.png
NVIDIA-Inspired model overview

This deeper network achieves similar performance comparing to the previous model, but with fewer parameters. The inference time is also pretty low. However, the NVIDIA model scored lower on the validation dataset.

Small VGG Style Network

This network is inspired the VGG network architecture, which won the ImageNet challenge in 2014.

Screen Shot 2018-01-27 at 5.10.27 PM.png
VGG style model view

As you can see, the network is much larger than the previous ones. It has 8 times the parameter as the Comma AI network. In theory, more parameters means more learning capabilities. In practice, however, the validation score for this network is low. Also, the inference time is a little bit slower as well. (Please note that Kaiming He et al. addressed this problem in his ResNet papers. He added short connections between Conv layers to solve the vanishing gradient issue. For simplicity’s sake, I didn’t utilize any of those techniques.)

Training the ConvNet

Initially, the model was trained on an AWS ECS instance. Because of the cost, I decided to ask for some funding from my school. Eventually, the school supported with an NVIDIA GTX 1060 6GB. The vehicle was powered by the NVIDIA JetsonTX2. The inferences were done on the Jetson. Please see the validation section for details in evaluating the performance. 

(Please note that the vehicle computer has been updated, checkout this post for more information. )

I trained all of these models for around 6 hours, which is usually how long it took before some overfitting happened. The training dataset, which is provided by Udacity, contains 20,000 labeled images. According to Udacity, the data was collected around the San Fransisco Bay Area with a Lincoln MK2. The images are 480x640x3 colored RGB images with no post-processing. I am very thankful for their open source work.

a frame from the Udacity dataset

Analyzing the Results

Just looking at the loss of the network on the training dataset can be deceiving. Often times the network could overfit, which gives you a very low loss on the training set but performs very poorly in the real world. Up until this point, I have not explained overfitting.🤔 It’s a common issue that deep learning models face. Imagine that you are trying to learn a new language, instead of actually understanding it, you merely memorized all the questions and answers on the test. This will do you no good in the real world, but it still guarantees a high score on a test. I know this analogy can fall short in many cases, but you get the point. The goal is to train a network that can perform really well with the training data as well as in the real world. This is why it’s critical to have a validation dataset to independently verify the training results.


During training, we used the Udacity validation dataset. (I think the validation data is better than the training data). This way, we can have an unbiased way to judge the performance. After training the neural network, we needed some ways to evaluate our results, not just using the loss function. This some kind of an evaluation method, we can compare our results to some state-of-the-art results. It turns out, the answer is right in front of us. We can use the Udacity validation set to score the performance and compare that result to the Udacity self-driving challenge #2 results. The following graph shows the predicted steering from the network and human steering values from the validation set.



We used two ways to visualize the training results. One is using the Udacity validation data, the second is using the Udacity driving simulator. Both ways demonstrate that the training result is very decent.

This is not to say it has no issues. The vehicle tends to make correction only when it’s close to the edge of the road. This might be because the road in the simulator is too wide. Also, different network architecture performs differently. (A cross comparison is not shown in this blog). After acquiring more training data a few weeks ago, I have not had a chance to further train the network. The dataset size increased from 40,000 to 90,000. I hope this will certainly improve the performance.

As always, please leave a comment below if you have any questions. Or you can email me at Thank you very much!

Further Reading

Posted by:NeilNie

Student at Columbia University, School of Engineering and Applied Sciences. Prev. software engineer intern at Apple. More than six years of experience developing iOS and macOS applications. Experienced in electrical engineering/microcontrollers. From publishing several apps to presented a TEDx talk in machine learning. Striving to use my knowledge, skills, and passion to positively impact the world.

10 replies on “Predicting Steering Angles with Deep Learning — Part 2

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.