top of page

Training an AI Lunar Lander using Deep - Q Learning

Problem Statement:

The Lunar Lander environment presents a classic control problem where the objective is to safely land a spacecraft on a designated landing pad. The agent must learn to maneuver the lander, controlling its main and side engines, to achieve a soft and precise landing.

 

The challenge lies in optimizing the lander's trajectory and engine firings to:

  • Minimize horizontal and vertical velocity at touchdown.

  • Maintain a near-horizontal orientation (angle close to 0) during descent and landing.

  • Ensure both landing legs are in contact with the ground for a stable landing.

  • Avoid crashing into the moon's surface or flying outside the designated viewport.

  • Achieve a high cumulative reward, which is influenced by proximity to the landing pad, movement speed, lander angle, leg contact, and fuel consumption.

 

The problem specifically involves training an agent to select from a discrete action space (do nothing, fire left engine, fire main engine, fire right engine) to control the lander, given an 8-dimensional observation space providing real-time information about the lander's position, velocity, angle, angular velocity, and leg contact status. The ultimate goal is to develop an intelligent agent capable of consistently achieving a score of 200 points or more, signifying a successful and efficient landing.

Project Details:

 

 

Description

 

This environment is a classic rocket trajectory optimization problem. According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off.

 

There are two environment versions: discrete or continuous. The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector. Landing outside of the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.

 

Action Space

 

There are four discrete actions available:

  • 0: do nothing

  • 1: fire left orientation engine

  • 2: fire main engine

  • 3: fire right orientation engine

Observation Space

 

The state is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.

 

Rewards

 

After every step a reward is granted. The total reward of an episode is the sum of the rewards for all the steps within that episode.

 

For each step, the reward:

  • is increased/decreased the closer/further the lander is to the landing pad.

  • is increased/decreased the slower/faster the lander is moving.

  • is decreased the more the lander is tilted (angle not horizontal).

  • is increased by 10 points for each leg that is in contact with the ground.

  • is decreased by 0.03 points each frame a side engine is firing.

  • is decreased by 0.3 points each frame the main engine is firing.

 

The episode receive an additional reward of -100 or +100 points for crashing or landing safely respectively.

An episode is considered a solution if it scores at least 200 points.

Starting State

 

The lander starts at the top center of the viewport with a random initial force applied to its center of mass.

 

Episode Termination

 

The episode finishes if:

 

  1. the lander crashes (the lander body gets in contact with the moon);

  2. the lander gets outside of the viewport (x coordinate is greater than 1);

  3. the lander is not awake. From the Box2D docs, a body which is not awake is a body which doesn’t move and doesn’t collide with any other body:

    1. When Box2D determines that a body (or group of bodies) has come to rest, the body enters a sleep state which has very little CPU overhead. If a body is awake and collides with a sleeping body, then the sleeping body wakes up. Bodies will also wake up if a joint or contact attached to them is destroyed.

Project Key Flow Chart for Lunar Landing:

 

 

  • Installation of required packages and importing the libraries

    • Installing Gymnasium

    • Importing the Libraries

 

  • Building the AI

    • Creating the Architecture of Neural Network

 

  • Training the AI

    • Setting up the Environment

    • Initializing the Hyperparameters

    • Implementing Experience Replay

    • Implementing the DQN Class

    • Initializing the DQN Agent

    • Training the DQN Agent

 

  • Visualizing the Results

Final Output:

bottom of page