# C/++/# "Q deep learning" algorithm

1. Dec 9, 2016

### Superposed_Cat

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
Code (C):

double maxQval;
double[] inputvec;
int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
double[] QtarVec = new double[] { 0, 0, 0, 0 };
double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });

Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
Code (C):

public void Train(int nTrails)
{
double[] state = new double[] { 1, 1 }; // inital position
int its = 0;

for (int i = 0; i < nTrails; i++) {
while (((state[0] < 4)   &&   (state[1] < 4))
&&((state[0] * 100 >0) && (state[1] * 100 >0))
&& (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
double temp = r.NextDouble();
int next = -1;
lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

if (temp < epsilon) {
next = TrainRandIt(state);     // move random direction, backprop
} else {
next = TrainMaxIt(state);       // move in max activation direction, backprop
}

if (next == 0)  {                           //updating postion
state[0]++;

}  else if (next == 1) {
state[0]--;

}  else if (next == 2) {
state[1]++;

} else if (next == 3) {
state[1]--;
}
}
}

state[0] = 1;
state[1] = 1;  // resetting game
}

Any Help appreciated.

Last edited by a moderator: Dec 13, 2016
2. Dec 13, 2016

### Staff: Mentor

My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

$(state[0]<4 && state[0]*100>0 && state[0]!=3)$

when this would be better:

$(state[0]<1 && state[0]>0 && state[0]!=3)$

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.