Is There an Error in My Q Deep Learning Algorithm? Need Help Troubleshooting!

  • Thread starter Superposed_Cat
  • Start date
  • Tags
    Algorithm
In summary, the conversation discusses the implementation of a Q deep learning algorithm and the issues the individual is facing with it not learning after 100,000 game plays and 1,000 iterations. They share a link to a network and game image and provide code for the algorithm. The individual asks for help and the expert suggests adding print statements or using a logging API to better understand the program's functionality. They also point out areas of concern in the while loop conditions and suggest using if statements and switch case constructs for better readability.
  • #1
Superposed_Cat
388
5
[mentor note: code blocks added for readability and syntax hilighting]

Hey all, been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, http://imgur.com/a/hATfB
C:
    double maxQval;
    double[] inputvec;
    int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
    double[] QtarVec = new double[] { 0, 0, 0, 0 };
    double r = GetR((int)state[0], (int)state[1]);     // GetR is reward

    QtarVec[MaxQ] = Qtar(r, maxQval);               // backprop vector of 0's except Qtar replaces a value

    associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
C:
public void Train(int nTrails)
{
      double[] state = new double[] { 1, 1 }; // inital position
      int its = 0;

      for (int i = 0; i < nTrails; i++) {
           while (((state[0] < 4)   &&   (state[1] < 4))  
                  &&((state[0] * 100 >0) && (state[1] * 100 >0)) 
                  && (state[0] != 3  && state[1] != 3)) {             //while on board and not at goal postion
                        double temp = r.NextDouble();
                        int next = -1;
                        lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));

                        if (temp < epsilon) {
                            next = TrainRandIt(state);     // move random direction, backprop
                        } else {
                            next = TrainMaxIt(state);       // move in max activation direction, backprop
                        }

                        if (next == 0)  {                           //updating postion 
                              state[0]++;

                         }  else if (next == 1) {
                              state[0]--;

                         }  else if (next == 2) {
                              state[1]++;

                        } else if (next == 3) {
                            state[1]--;
                        }
                    }
                }

                state[0] = 1;
                state[1] = 1;  // resetting game
            }

Any Help appreciated.
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
My suggestion would be to add print statements in or use log4c api to annotate your code and run it through its paces printing out variable and relevant array values and study the printouts carefully.

You should understand step by step how your program works and if the output says otherwise then there is your mistake.

Two areas of concern are the while loop conditions are overly complicated with an unnecessary level of parens on the first two terms. I would have used if statements and perhaps a for loop to break out.

Looking at the while conditions I have to ask why write:

##(state[0]<4 && state[0]*100>0 && state[0]!=3)##

when this would be better:

##(state[0]<1 && state[0]>0 && state[0]!=3)##

Similarly for state[1]

Also stylistically the compound if statement would be better served with a switch case construct.
 

1. What is the Q deep learning algorithm?

The Q deep learning algorithm is a type of reinforcement learning algorithm that uses a neural network to approximate the optimal action-selection policy for a given environment. It is often used in sequential decision-making tasks where the optimal action at each step is dependent on the previous actions taken.

2. How does the Q deep learning algorithm work?

The Q deep learning algorithm works by using a neural network to learn the Q-value for each possible action in a given state. The Q-value represents the expected future reward for taking a particular action in that state. The algorithm then uses this information to update its policy and choose the best action to take in each state.

3. What are the advantages of using the Q deep learning algorithm?

One of the main advantages of the Q deep learning algorithm is its ability to handle complex and high-dimensional environments. It also has the ability to learn from experience and improve its performance over time. Additionally, it can handle continuous action spaces, making it useful for a wide range of applications.

4. What are some real-world applications of the Q deep learning algorithm?

The Q deep learning algorithm has been successfully applied in various fields, including robotics, finance, and gaming. For example, it has been used to train robots to perform complex tasks, such as grasping and manipulation. It has also been used in algorithmic trading to make decisions on buying and selling stocks. In gaming, it has been used to create AI agents that can learn and adapt to different game environments.

5. Are there any limitations to the Q deep learning algorithm?

One limitation of the Q deep learning algorithm is that it requires a large amount of training data to learn an accurate policy. This can be a challenge in some real-world applications where data may be limited or expensive to obtain. Additionally, it may struggle with environments that have sparse rewards, as it may take a long time for the algorithm to learn the optimal policy in these cases.

Similar threads

  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
8
Views
2K
  • Programming and Computer Science
3
Replies
75
Views
4K
  • Programming and Computer Science
Replies
16
Views
3K
Replies
2
Views
897
  • Atomic and Condensed Matter
Replies
3
Views
860
  • Engineering and Comp Sci Homework Help
Replies
3
Views
1K
  • Programming and Computer Science
Replies
1
Views
4K
Replies
3
Views
841
  • Programming and Computer Science
Replies
4
Views
3K
Back
Top