In part 1 of this series, I discussed the algorithm behind the autonomous steering system, specifically, convolutional neural networks. In this post, I will dive deeper into the details of the networks that we are using, as well as the development process.
If you just came across my blog, I am excited to share with you that I am building a self-driving golf cart. To learn more about the project, please visit the project page.
The ConvNet Architecture
If you have seen or used ConvNets before, this architecture might look very similar. A typical ConNet consists of several convolutional layers, pooling layers, activation layers, and maybe some dense layers. The architecture of any ConvNet often determines the performance of the network. Novel techniques, such as short connections and dense blocks, will markedly improve the performance. In this research/project, I did not use the techniques mentioned above. But there are other tricks and methods that helped with the performance. Below, I will talk about the architecture that experimented with. All of them have the same input layer, which is (160*320*3), and the same output layer, which is a scalar value.
The Comma AI Architecture
The first one is inspired by the Comma AI self-driving research project.
This network performed quite well on the validation dataset (validation related topics are below) even though the network is relatively shallow. The advantage is that training and inferencing time is very low. Michael suggested me to use ELU (leaky relu) for the activation functions, which performed better than traditional RELU.
NVIDIA Inspired network
In the previous post, I mentioned the NVIDIA self-driving research, specifically their end to end system. The second network we used is largely inspired by the CNN in their paper.
This deeper network achieves similar performance comparing to the previous model, but with fewer parameters. The inference time is also pretty low. However, the NVIDIA model scored lower on the validation dataset.
Small VGG Style Network
This network is inspired the VGG network architecture, which won the ImageNet challenge in 2014.
As you can see, the network is much larger than the previous ones. It has 8 times the parameter as the Comma AI network. In theory, more parameters means more learning capabilities. In practice, however, the validation score for this network is low. Also, the inference time is a little bit slower as well. (Please note that Kaiming He et al. addressed this problem in his ResNet papers. He added short connections between Conv layers to solve the vanishing gradient issue. For simplicity’s sake, I didn’t utilize any of those techniques.)
Training the ConvNet
Initially, the model was trained on an AWS ECS instance. Because of the cost, I decided to ask for some funding from my school. Eventually, the school supported with an NVIDIA GTX 1060 6GB. The vehicle was powered by the NVIDIA JetsonTX2. The inferences were done on the Jetson. Please see the validation section for details in evaluating the performance.
(Please note that the vehicle computer has been updated, checkout this post for more information. )
I trained all of these models for around 6 hours, which is usually how long it took before some overfitting happened. The training dataset, which is provided by Udacity, contains 20,000 labeled images. According to Udacity, the data was collected around the San Fransisco Bay Area with a Lincoln MK2. The images are 480x640x3 colored RGB images with no post-processing. I am very thankful for their open source work.
Analyzing the Results
Just looking at the loss of the network on the training dataset can be deceiving. Often times the network could overfit, which gives you a very low loss on the training set but performs very poorly in the real world. Up until this point, I have not explained overfitting.🤔 It’s a common issue that deep learning models face. Imagine that you are trying to learn a new language, instead of actually understanding it, you merely memorized all the questions and answers on the test. This will do you no good in the real world, but it still guarantees a high score on a test. I know this analogy can fall short in many cases, but you get the point. The goal is to train a network that can perform really well with the training data as well as in the real world. This is why it’s critical to have a validation dataset to independently verify the training results.
During training, we used the Udacity validation dataset. (I think the validation data is better than the training data). This way, we can have an unbiased way to judge the performance. After training the neural network, we needed some ways to evaluate our results, not just using the loss function. This some kind of an evaluation method, we can compare our results to some state-of-the-art results. It turns out, the answer is right in front of us. We can use the Udacity validation set to score the performance and compare that result to the Udacity self-driving challenge #2 results. The following graph shows the predicted steering from the network and human steering values from the validation set.
We used two ways to visualize the training results. One is using the Udacity validation data, the second is using the Udacity driving simulator. Both ways demonstrate that the training result is very decent.
This is not to say it has no issues. The vehicle tends to make correction only when it’s close to the edge of the road. This might be because the road in the simulator is too wide. Also, different network architecture performs differently. (A cross comparison is not shown in this blog). After acquiring more training data a few weeks ago, I have not had a chance to further train the network. The dataset size increased from 40,000 to 90,000. I hope this will certainly improve the performance.
As always, please leave a comment below if you have any questions. Or you can email me at email@example.com. Thank you very much!
- Deep learning steering prediction
- Semantic segmentation
- Driver by wire system (DBW)