Q-learning is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. It’s considered off-policy because the Q-Learning function learns from actions that are outside the current policy, like taking random actions, and therefore a policy isn’t needed. More specifically, Q-Learning seeks to learn a policy that maximizes the total reward.

Today we will try to find the shortest path connecting the Start and End Vertices, using Q-Learning and C Language. For our implementation, we have considered the following undirected unweighted graph -

Graph with 8 Vertices

And for convenience, we have assumed the end vertex to be Node 7. Once you understand how the code works, you can modify the end vertex to any node you like, or even take it as an input from the user!

Let’s Code!!

We begin with including the required C libraries as well as defining a macro. Along with this, we will define certain global variables.

#include <stdio.h>
#include <stdlib.h>
#define RAN_LIM 500000

double qMatrix[8][8], rMatrix[8][8], gammaLR = 0.8;
int max_index[8], available_acts[8];
int ran_top = 0, ran[RAN_LIM];

Here, qMatrix is a 2D array which will represent our Q-Matrix. rMatrix is again a 2D array representing the rewards/points. Both of these matrices act like Adjacency Matrices for our Graph. We have chosen our learning rate to be 0.8 (gammaLR). We will understand the use of other parameters later as and when they are used.

Let us start our understanding with the main function. Initially all the required variables are initialized.

//Main Function Begins

    int i, j;
    int initial_state, final_state = 7;
    int current_state, size_av_actions, action;
    double final_max=0.0, scores[100000], rMatrix[8][8], score=0.0;
//Main Function Continued*

As mentioned earlier, we have restricted our final state to the 7th node (‘final_state’). While training the Q-Matrix, we need to keep track of the current state and the next state which is represented by ‘current_state’ and ‘action’ respectively. We shall understand the use of other variables later in the code.

In the following code, we are doing 3 things.

  1. Firstly, we will take an initial state as input from the user.
  2. Secondly, we will generate an array that will contain **random numbers **ranging from 0 to 7 (both inclusive).
  3. And finally, we will fill in the values of the Q-Matrix as well as the R-Matrix according to the previously mentioned graph.

You may use any graph of your choice, but remember to change the inputs of the matrices accordingly. Our Q-Matrix will initially contain only 0 values. In our R-Matrix, we will put the value ‘0’ for adjacent nodes, ‘-1’ for non-adjacent nodes and ‘100’ for the cases where nodes are adjacent with Node-7 (Final Vertex). In short we are giving rewards to the paths that lead us to the Final Node (7).

//Main Function Continued*

    //Input Initial State
    printf("Enter the initial state: ");
    scanf("%d",&initial_state);

    //Random Number from 0 to 7   
    for (int i = 0; i < RAN_LIM; i++)
    {
        ran[i] = rand() % 8;
    }

    for (i = 0; i < 8; i++)
    {
        for (j = 0; j < 8; j++)
        {
            rMatrix[i][j] = -1.0;
            qMatrix[i][j] = 0.0;

            if ((i == 0 && j == 1) || (i == 1 && j == 5) || (i == 5 && j == 6) || (i == 5 && j == 4) || (i == 1 && j == 2) || (i == 2 && j == 3) || (i == 2 && j == 7) || (i == 4 && j == 7) || (i == 1 && j == 4))
            {
                rMatrix[i][j] = 0.0;
            }

            if ((j == 0 && i == 1) || (j == 1 && i == 5) || (j == 5 && i == 6) || (j == 5 && i == 4) || (j == 1 && i == 2) || (j == 2 && i == 3) || (j == 2 && i == 7) || (j == 4 && i == 7) || (j == 1 && i == 4) )
            {
                rMatrix[i][j] = 0.0;
            }

            if ((i == 2 && j == 7) || (i == 7 && j == 7) ||(i == 4 && j == 7))
            {
                rMatrix[i][j] = 100.0;
            }
        }
    }
//Main Function Continued**

Let us now take a look at our R-Matrix.

//Main Function Continued**

    printf("\nPoints Matrix : \n");
    for (i = 0; i < 8; i++)
    {
        for (j = 0; j < 8; j++)
        {
            printf("%f\t",rMatrix[i][j]);
        }
        printf("\n");
    }
    printf("\n\n\n");

    printf("%f", rMatrix[7][7]);
//Main Function Continued***

R-Matrix

#graph #c-programming #machine-learning #c

Q-Learning Using C Language
12.05 GEEK