Train “undying” Flappy Bird using Reinforcement Learning on Java

Train “undying” Flappy Bird using Reinforcement Learning on Java

In this article, we will start from the basis of RL and walk through the key components to build the training architecture. If at anytime you cannot follow our code and would like to try the game, you can refer to our RL repo.

Flappy Bird is a mobile game that was introduced in 2013 which became super popular because of its simple way to play (flap/no-flap). With the growth of Deep Learning (DL) and Reinforcement Learning (RL), we can now train an AI agent to control the Flappy Bird actions. Today, we will look at the process to create an AI agent using Java. For the game itself, we used a simple open-source Flappy Bird game using Java. For training, we used Deep Java Library (DJL), a deep learning framework based on Java, to build the training network and RL algorithm.

In this article, we will start from the basis of RL and walk through the key components to build the training architecture. If at anytime you cannot follow our code and would like to try the game, you can refer to our RL repo.

The RL Architecture

In this section, we will introduce some major algorithm and networks we used to help you better understand how we trained the model. This project used a similar approach with DeepLearningFlappyBird, a Python Flappy Bird RL implementation. The main RL architecture is Q-Learning, a Convolutional Neural Network (CNN). In each game action stage, we store the current state of the bird, the action the agent took, and the next state of the bird. These are treated as the training data of the CNN.

CNN Training Overview

The input data for training is a continuous four-frame image. We stack these four images to form an “observation” of the bird. The observation here means time-series data represented by a series of images. The image itself is gray-scaled to reduce the training load. The array representation of the image is (batch size, 4 (frames), 80 (width), 80 (height)). Each element of the array represents the pixel value of each frame. These data are fed into the CNN and compute to an output (batch size, 2). The second dimension of the output represents the confidence of the next action (flap, no-flap).

We use the actual action recorded against the output confidence to compute the loss. After that, the model will be updated through back propagation and parameter optimization. The data used for training are continuously updated by the agent to achieve a better result.

Training data

After the action stage, we create preObservation and currentObservation. As mentioned before, these are just images that represent a series of movement. After that, we just put preObservation, currentObservation, action, reward and terminal together as a step stored into the replayBuffer. The replayBuffer is the training dataset with a limited size and dynamically updated with the latest actions.

public void step(NDList action, boolean training) {
    if (action.singletonOrThrow().getInt(1) == 1) {
    NDList preObservation = currentObservation;
    currentObservation = createObservation(currentImg);
    FlappyBirdStep step = new FlappyBirdStep(manager.newSubManager(),
            preObservation, currentObservation, action, currentReward, currentTerminal);
    if (training) {
    if (gameState == GAME_OVER) {

Three stages of RL

There are three different stages of RL used to generate better training data:

  • Observation Stage: Most actions are random with a small portion of actions coming from the AI agent
  • Exploration Stage: Random actions and AI agent actions are combined
  • Training Stage: Actions are primarily produced by the AI agent

During the exploration stage, we will choose between random action and AI agent action for the bird. At the beginning of training random actions are primarily used, since the actions generated by the AI agent are generally poor. After that, we gradually increase the probability of taking the AI agent’s action until it ultimately becomes the only decision maker. The parameter that is used to adjust the ratio between random and AI agent actions is called epsilon. It will change constantly through the training process.

public NDList chooseAction(RlEnv env, boolean training) {
    if (training && RandomUtils.random() < exploreRate.getNewValue(counter++)) {
        return env.getActionSpace().randomAction();
    } else return baseAgent.chooseAction(env, training);

java reinforcement-learning machine-learning ai

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Hire Machine Learning Engineer | Offshore Machine Learning Experts

We are a Machine Learning Services provider offering custom AI solutions, Machine Learning as a service & deep learning solutions. Hire Machine Learning experts & build AI Chatbots, Neural networks, etc. 16+ yrs & 2500+ clients.

Learning by Doing: How to Learn Java Basics by Building Your Own Project

Check out some ideas for sample projects that can help you learn Java, and programming in general, in a fun and engaging way.

5 Latest Technology Trends of Machine Learning for 2021

Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.

What is Machine learning and Why is it Important?

Machine learning is quite an exciting field to study and rightly so. It is all around us in this modern world. From Facebook’s feed to Google Maps for navigation, machine learning finds its application in almost every aspect of our lives. It is quite frightening and interesting to think of how our lives would have been without the use of machine learning. That is why it becomes quite important to understand what is machine learning, its applications and importance.

Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.