Developing a Robust and Efficient Autonomous Racing Agent
Problem Statement:
The Challenge:
Autonomous racing, particularly within simulated environments like AWS DeepRacer, presents a significant challenge in training intelligent agents that can navigate a track efficiently, quickly, and consistently. The goal is to maximize track completion while minimizing lap times and avoiding off-track incidents. Achieving this requires a sophisticated control policy that can learn from continuous environmental feedback.
Current Approaches and Their Limitations:
Traditional reinforcement learning (RL) approaches, while effective, often face trade-offs between exploration (discovering new, potentially better actions) and exploitation (utilizing known good actions). On-policy algorithms like Proximal Policy Optimization (PPO) are known for their stability and reliable convergence, but they can be data-hungry, requiring a large number of interactions with the environment. Off-policy algorithms like Soft Actor-Critic (SAC), on the other hand, are celebrated for their data efficiency due to their ability to learn from past experiences, but they can sometimes be less stable during training.
The Problem We Address:
This project aims to develop and compare high-performance autonomous driving agents for AWS DeepRacer by leveraging and analyzing two prominent reinforcement learning algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC).
Specifically, we seek to answer the following questions:
-
Can we effectively train a DeepRacer agent to achieve optimal lap times and consistent track navigation using both PPO and SAC algorithms under varying track conditions and reward functions?
-
How do the performance characteristics (e.g., convergence speed, final lap times, stability, exploration-exploitation balance, and data efficiency) of PPO and SAC compare when applied to the DeepRacer environment?
-
What are the optimal hyperparameter configurations for both PPO and SAC in the DeepRacer context to achieve peak performance, and how do these configurations influence the resulting driving policy?
Our Goal:
By systematically training and evaluating DeepRacer models with both PPO and SAC, this project aims to:
-
Identify the strengths and weaknesses of each algorithm in the context of autonomous racing on AWS DeepRacer.
-
Provide insights into the practical application of these algorithms for developing robust and efficient self-driving agents.
-
Contribute to a deeper understanding of the trade-offs involved in selecting and tuning RL algorithms for real-world (or simulated-to-real) control problems.
-
Ultimately, create a DeepRacer model capable of achieving consistently fast and reliable laps, demonstrating the power of reinforcement learning for autonomous navigation.
Project Details:
Model Name - StudentRacer-PPO
Environment Selection - 2022 re:Invent Championship

Track Direction - Counter Clock-wise
Race Type - Head-to-head racing

No. of bot vehicles - 1
Speed (m/s) - 0.5 m/s
No Changing of lanes allowed
Training algorithm and hyperparameters - PPO
Gradient Descent Batch size - 64
Number of Epochs - 10
Learning rate - 0.0003
Entropy - 0.01
Discount factor - 0.99
Loss type - Huber
Number of experience episodes between each policy updating iteration - 20
Action Space - Discrete action space
Steering angle granularity - 5
Maximum steering angle - 30 degrees
Speed granularity - 2
Maximum speed - 1 m/s
Vehicle Selection - The Original DeepRacer
Stop Conditions - 60 Minutes
Training Results:

Evaluation Results:

Final Output:
