- #1
Superposed_Cat
- 388
- 5
[mentor note: code blocks added for readability and syntax hilighting]
Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
Any Help appreciated.
Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
C:
double maxQval;
double[] inputvec;
int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
double[] QtarVec = new double[] { 0, 0, 0, 0 };
double r = GetR((int)state[0], (int)state[1]); // GetR is reward
QtarVec[MaxQ] = Qtar(r, maxQval); // backprop vector of 0's except Qtar replaces a value
associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
C:
public void Train(int nTrails)
{
double[] state = new double[] { 1, 1 }; // inital position
int its = 0;
for (int i = 0; i < nTrails; i++) {
while (((state[0] < 4) && (state[1] < 4))
&&((state[0] * 100 >0) && (state[1] * 100 >0))
&& (state[0] != 3 && state[1] != 3)) { //while on board and not at goal postion
double temp = r.NextDouble();
int next = -1;
lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));
if (temp < epsilon) {
next = TrainRandIt(state); // move random direction, backprop
} else {
next = TrainMaxIt(state); // move in max activation direction, backprop
}
if (next == 0) { //updating postion
state[0]++;
} else if (next == 1) {
state[0]--;
} else if (next == 2) {
state[1]++;
} else if (next == 3) {
state[1]--;
}
}
}
state[0] = 1;
state[1] = 1; // resetting game
}
Any Help appreciated.
Last edited by a moderator: