Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

C/++/# "Q deep learning" algorithm

  1. Dec 9, 2016 #1
    [mentor note: code blocks added for readability and syntax hilighting]

    Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
    Code (C):

        double maxQval;
        double[] inputvec;
        int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
        double[] QtarVec = new double[] { 0, 0, 0, 0 };
        double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

        QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

        associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
     
    Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
    Code (C):

    public void Train(int nTrails)
    {
          double[] state = new double[] { 1, 1 }; // inital position
          int its = 0;

          for (int i = 0; i < nTrails; i++) {
               while (((state[0] < 4)   &&   (state[1] < 4))  
                      &&((state[0] * 100 >0) && (state[1] * 100 >0))
                      && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                            double temp = r.NextDouble();
                            int next = -1;
                            lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                            if (temp < epsilon) {
                                next = TrainRandIt(state);     // move random direction, backprop
                            } else {
                                next = TrainMaxIt(state);       // move in max activation direction, backprop
                            }

                            if (next == 0)  {                           //updating postion
                                  state[0]++;

                             }  else if (next == 1) {
                                  state[0]--;

                             }  else if (next == 2) {
                                  state[1]++;

                            } else if (next == 3) {
                                state[1]--;
                            }
                        }
                    }

                    state[0] = 1;
                    state[1] = 1;  // resetting game
                }
     
    Any Help appreciated.
     
    Last edited by a moderator: Dec 13, 2016
  2. jcsd
  3. Dec 13, 2016 #2

    jedishrfu

    Staff: Mentor

    My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

    You should understand step by step how your program works and if the output says otherwise then there is your mistake.

    Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

    Looking at the while conditions I have to ask why write:

    ##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

    when this would be better:

    ##(state[0]<1 && state[0]>0 && state[0]!=3)##

    Similarly for state[1]

    Also stylistically the compound if statement would be better served with a switch case construct.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: "Q deep learning" algorithm
  1. VEGAS algorithm (Replies: 2)

  2. Ask an algorithm (Replies: 3)

  3. Knuth Algorithm (Replies: 3)

  4. Ranking Algorithm (Replies: 22)

Loading...