Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

  • Context:
  • Thread starter Thread starter Superposed_Cat
  • Start date Start date
  • Tags Tags
    Algorithm
Click For Summary
SUMMARY

The forum discussion centers on troubleshooting a Q deep learning algorithm that fails to learn after 100,000 game plays and 1,000 training iterations. The user implements a reward system where rewards are -10 for obstacles and 10 for goals, but the algorithm does not improve. Key suggestions include adding print statements for debugging and simplifying the while loop conditions to enhance clarity and functionality. The mentor emphasizes the importance of understanding the program's flow and suggests restructuring the conditional logic for better performance.

PREREQUISITES
  • Understanding of Q-learning algorithms and reinforcement learning principles
  • Familiarity with C# programming and syntax
  • Knowledge of debugging techniques, including logging and print statements
  • Experience with neural network training and backpropagation methods
NEXT STEPS
  • Implement logging using log4c API to track variable states during execution
  • Refactor the while loop conditions for clarity and efficiency
  • Explore alternative reward structures to optimize learning outcomes
  • Learn about using switch-case constructs for cleaner conditional logic in C#
USEFUL FOR

Developers and data scientists working on reinforcement learning projects, particularly those implementing Q-learning algorithms in C#. This discussion is also beneficial for anyone seeking to improve their debugging skills in deep learning applications.

Superposed_Cat
Messages
388
Reaction score
5
[mentor note: code blocks added for readability and syntax hilighting]

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
C:
    double maxQval;
    double[] inputvec;
    int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
    double[] QtarVec = new double[] { 0, 0, 0, 0 };
    double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

    QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

    associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
C:
public void Train(int nTrails)
{
      double[] state = new double[] { 1, 1 }; // inital position
      int its = 0;

      for (int i = 0; i < nTrails; i++) {
           while (((state[0] < 4)   &&   (state[1] < 4))  
                  &&((state[0] * 100 >0) && (state[1] * 100 >0)) 
                  && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                        double temp = r.NextDouble();
                        int next = -1;
                        lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                        if (temp < epsilon) {
                            next = TrainRandIt(state);     // move random direction, backprop
                        } else {
                            next = TrainMaxIt(state);       // move in max activation direction, backprop
                        }

                        if (next == 0)  {                           //updating postion 
                              state[0]++;

                         }  else if (next == 1) {
                              state[0]--;

                         }  else if (next == 2) {
                              state[1]++;

                        } else if (next == 3) {
                            state[1]--;
                        }
                    }
                }

                state[0] = 1;
                state[1] = 1;  // resetting game
            }

Any Help appreciated.
 
Last edited by a moderator:
Technology news on Phys.org
My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

when this would be better:

##(state[0]<1 && state[0]>0 && state[0]!=3)##

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 75 ·
3
Replies
75
Views
7K
Replies
2
Views
1K
Replies
16
Views
8K
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K