Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

Superposed_Cat · Dec 9, 2016

[mentor note: code blocks added for readability and syntax hilighting]

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB

C:

    double maxQval;
    double[] inputvec;
    int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
    double[] QtarVec = new double[] { 0, 0, 0, 0 };
    double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

    QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

    associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });

Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.

C:

public void Train(int nTrails)
{
      double[] state = new double[] { 1, 1 }; // inital position
      int its = 0;

      for (int i = 0; i < nTrails; i++) {
           while (((state[0] < 4)   &&   (state[1] < 4))  
                  &&((state[0] * 100 >0) && (state[1] * 100 >0)) 
                  && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                        double temp = r.NextDouble();
                        int next = -1;
                        lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                        if (temp < epsilon) {
                            next = TrainRandIt(state);     // move random direction, backprop
                        } else {
                            next = TrainMaxIt(state);       // move in max activation direction, backprop
                        }

                        if (next == 0)  {                           //updating postion 
                              state[0]++;

                         }  else if (next == 1) {
                              state[0]--;

                         }  else if (next == 2) {
                              state[1]++;

                        } else if (next == 3) {
                            state[1]--;
                        }
                    }
                }

                state[0] = 1;
                state[1] = 1;  // resetting game
            }

Any Help appreciated.

jedishrfu · Dec 13, 2016

My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

when this would be better:

##(state[0]<1 && state[0]>0 && state[0]!=3)##

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.

Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

Thread 'Learning Assembly and computer architecture for x86'

Thread 'Learning data structures and algorithms in different programming languages'

Thread 'A Crisis for Newly Minted CompSci Majors -- entry level jobs gone'

Similar threads

Hot Threads

Hackathon ideas?

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective