Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

  • Thread starter Thread starter Superposed_Cat
  • Start date Start date
  • Tags Tags
    Algorithm
AI Thread Summary
The discussion centers on issues encountered while implementing a Q deep learning algorithm for a game, where the algorithm fails to learn after extensive training. The user reports running 100,000 game plays with 1,000 iterations for training each step, yet the network does not show improvement. Key components include the use of a reward system, where rewards are assigned based on game outcomes, and a training method that employs both random and maximum activation moves.Concerns are raised regarding the complexity of the while loop conditions, suggesting that simplifying these conditions could improve clarity and functionality. Recommendations include adding print statements or using logging to trace variable values and understand the program's behavior better. Additionally, the discussion suggests that restructuring the conditional logic could enhance the code's readability and efficiency, proposing alternatives to the current approach.
Superposed_Cat
Messages
388
Reaction score
5
[mentor note: code blocks added for readability and syntax hilighting]

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
C:
    double maxQval;
    double[] inputvec;
    int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
    double[] QtarVec = new double[] { 0, 0, 0, 0 };
    double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

    QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

    associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
C:
public void Train(int nTrails)
{
      double[] state = new double[] { 1, 1 }; // inital position
      int its = 0;

      for (int i = 0; i < nTrails; i++) {
           while (((state[0] < 4)   &&   (state[1] < 4))  
                  &&((state[0] * 100 >0) && (state[1] * 100 >0)) 
                  && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                        double temp = r.NextDouble();
                        int next = -1;
                        lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                        if (temp < epsilon) {
                            next = TrainRandIt(state);     // move random direction, backprop
                        } else {
                            next = TrainMaxIt(state);       // move in max activation direction, backprop
                        }

                        if (next == 0)  {                           //updating postion 
                              state[0]++;

                         }  else if (next == 1) {
                              state[0]--;

                         }  else if (next == 2) {
                              state[1]++;

                        } else if (next == 3) {
                            state[1]--;
                        }
                    }
                }

                state[0] = 1;
                state[1] = 1;  // resetting game
            }

Any Help appreciated.
 
Last edited by a moderator:
Technology news on Phys.org
My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

when this would be better:

##(state[0]<1 && state[0]>0 && state[0]!=3)##

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...
Back
Top