Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

Superposed_Cat · Dec 9, 2016

[mentor note: code blocks added for readability and syntax hilighting]

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB

C:

    double maxQval;
    double[] inputvec;
    int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
    double[] QtarVec = new double[] { 0, 0, 0, 0 };
    double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

    QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

    associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });

Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.

C:

public void Train(int nTrails)
{
      double[] state = new double[] { 1, 1 }; // inital position
      int its = 0;

      for (int i = 0; i < nTrails; i++) {
           while (((state[0] < 4)   &&   (state[1] < 4))  
                  &&((state[0] * 100 >0) && (state[1] * 100 >0)) 
                  && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                        double temp = r.NextDouble();
                        int next = -1;
                        lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                        if (temp < epsilon) {
                            next = TrainRandIt(state);     // move random direction, backprop
                        } else {
                            next = TrainMaxIt(state);       // move in max activation direction, backprop
                        }

                        if (next == 0)  {                           //updating postion 
                              state[0]++;

                         }  else if (next == 1) {
                              state[0]--;

                         }  else if (next == 2) {
                              state[1]++;

                        } else if (next == 3) {
                            state[1]--;
                        }
                    }
                }

                state[0] = 1;
                state[1] = 1;  // resetting game
            }

Any Help appreciated.

jedishrfu · Dec 13, 2016

My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

when this would be better:

##(state[0]<1 && state[0]>0 && state[0]!=3)##

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.

Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

1. What is the Q deep learning algorithm?

2. How does the Q deep learning algorithm work?

3. What are the advantages of using the Q deep learning algorithm?

4. What are some real-world applications of the Q deep learning algorithm?

5. Are there any limitations to the Q deep learning algorithm?

Similar threads

Hot Threads

Recent Insights