# Neural Network Not Working

Summary:
I've no idea why it's not working ...
Hey, guys.

So, I've developed a basic multilayered, feedforward neural network from scratch in Python. However, I cannot for the life of me figure out why it is still not working. I've double checked the math like ten times, and the actual code is pretty simple. So, I have absolutely no idea where I am going wrong.

You can see my neural net on my github pages website if you follow the link below:

The web page is very neat and written like a tutorial. However, it's not a very good tutorial, because the program doesn't work.

I just cannot for the life of me figure out why. If you are interested, you may peruse that web page and maybe tell me where I'm going wrong. I will appreciate it. If you are not interested, that is okay, too.

Thanks.

jedishrfu
Mentor
Sadly, this is not a reasonable request. No one is going to slog through python code to see why it doesn't work especially in the case of neural net code where you use training data to teach it and then test data to make sure it works.

There are just so many places where it could go wrong, swpped indices, wrong indices, math function not returning what you expect.

My suggestion is to step through the code with a good python IDE like Pycharm and see if you can figure out why it is misbehaving. It may take a really long time but through persistent effort you will succeed or decide to become a manager so you can boss other people into finding the bug.

You can also start at the bottom and make sure each of your low level methods / functions are working as expected. May even creating testcases to run them thru several types of input to exercise each if branch and loop in the code.

You know its not working which means you've seen something that tells you its not working so you are the best one to debug it. We don't have that knowledge.

PeterDonis
Mentor
2020 Award
it is still not working

What does "not working" mean? By itself that statement is so vague that I don't see how you could expect anyone to have any useful input.

Well, there's no obvious error. The program runs, but it doesn't do what it's suppose to. The training algorithm does not seem to be working correctly. After training, the network outputs complete nonsense.

I am comparing it with the MLP class in the Sklearn library. The Sklean MLP can handle what I'm trying to do perfectly. However, my network outputs complete nonsense after training.

It also does weird things that should not be possible, like computing identical losses after instantiating and training the class multiple times, and outputting matrices that are composed entirely of a single duplicated vector or row.

The coding is based entirely off of the math. So, I'm guessing I did the math wrong, but I can't see where.

Staff Emeritus
What does "not working" mean? By itself that statement is so vague that I don't see how you could expect anyone to have any useful input.
but it doesn't do what it's suppose to.
It also does weird things that should not be possible

Well, that certainly clears things up!

@jedishrfu and @PeterDonis are right. "Debug my code for me" is not a reasonable request, especially if you can't give a clear, simple and reproducible way to generate the error. That should be your next step - figuring out the minimum code and the minimum data to reproduce the error. Without that, we don't have a hope of figuring it out. With that, maybe you can without any help.

PeterDonis
Mentor
2020 Award
it doesn't do what it's suppose to

So what is it supposed to do? And what different thing is it actually doing? We can't read your mind and we can't see your computer screen by clairvoyance.

I am comparing it with the MLP class in the Sklearn library.

Comparing what? Expected outputs for some set of test inputs? What are they? How are we supposed to know any of this if you don't tell us the specifics?

You're right. I'm a little burnt out on this thing, though. I'm not sure how to give a precise description of what is not working when I have no idea what's not working.

I was hoping to get a second opinion on the math that I'm using, but it would be tough to write it all out in its entirety. If someone is very familiar with feedforward neural nets, I was hoping they might be able to take a quick look at the math I'm using and verify whether it's correct or not.

There are only slight differences between my formulas and the formulas found on the wiki page for backpropagation under matrix multiplication. Where I am calling the gradient of the cost function ##C## with respect to the input layer ##z##, or ##\nabla_{z}C##, the wiki page is calling just delta, or ##\delta##.

The main differences that I'm seeing between my formulas and the wiki page's formulas is the wiki page is using ##\nabla_{W}^{(l)}C=\delta^{(l)}(a^{(l-1)})^{T}## where I am using ##\nabla_{W}^{(l)}C=(a^{(l-1)})^{T}\delta^{(l)}##, and the wiki page is using ##\delta^{(l-1)}=(f^{(l-1)})^{'}\bullet(W^{(l)})^{T}\bullet\delta^{(l)}## where I am using ##\delta^{(l-1)}=(f^{(l-1)})^{'}\bullet\delta^{(l)}(W^{(l)})^{T}##.

I think the reason for the slight difference in the formulas, is because the wiki page is doing a special case of row-wise backpropagation, where I am doing backpropagation on the entire data set at once.

In the wiki page, ##\delta^{(l)}## is a vector, which is most likely the error of a single example in the data set, but I am having ##\delta^{(l)}## be a matrix, which is the error term for all of the data in the set. Is it not possible to do backpropagation on all of the data at once? It's intuitive to me to do backpropagation on all of the data and make ##\delta^{(l)}## a matrix, because the loss function ##C## is a function of all of the data. In my model, the loss/cost function is ##C(\hat{Y};Y)##, where the model output ##\hat{Y}## and the target matrix ##Y## are both matrices. So, taking the gradient of the loss/cost function must also be a matrix, right? How could it be a vector? And what would be the point of separating the data into batches if the gradient is being computed for each row at a time?

It's also not a lot of code. I'm just saying if anyone is interested, they can look over it. If not, that's fine.

Last edited:
jedishrfu
So what is it supposed to do? And what different thing is it actually doing? We can't read your mind and we can't see your computer screen by clairvoyance.

Comparing what? Expected outputs for some set of test inputs? What are they? How are we supposed to know any of this if you don't tell us the specifics?

I'm not sure if that's entirely relevant, but I do understand your point. The error is not from setting up the problem, or applying it to a specific problem. A neural net takes an input matrix and a target matrix, does a backpropagation gradient descent algorithm and outputs an output matrix that should approximate the target matrix

The error is somewhere within the class itself. I think it may have something to do with how I developed the math model for the neural network, but it's still eluding me what exactly is wrong with the math. If the math is actually correct, than it must be some weird bug, maybe from using pointers. Who knows, but I want to make sure the math is correct. Perhaps I should have rephrased the OP.

You might consider posting your code to e.g. http://pythonfiddle.com/, or another python online 'fiddle' IDE, and working on it there ##-## if you make it publicly visible (with notes for what is or isn't working) perhaps even someone who has worked the same tutorial might run across your code ##\dots##

Last edited:
How might the loss function compute a total loss that is exactly the same every time you train the network? Shouldn't that be impossible, since the weights are initialized randomly? I am not specifying any seed for the random weights, and it would be unlikely that Numpy would seed the random variables exactly the same multiple times. This should not be possible, and is likely not due to an incorrect mathematical formula.

So, I'm starting to think that perhaps the math is correct, even though it is a little bit different than what is written on Wikipedia. This makes me think it's a coding error, but I only have like 80 lines of code here, and I'm not seeing it.

atyy
In the wiki page, ##\delta^{(l)}## is a vector, which is most likely the error of a single example in the data set, but I am having ##\delta^{(l)}## be a matrix, which is the error term for all of the data in the set. Is it not possible to do backpropagation on all of the data at once?

You can do backpropagation on all the data at one time (called 'batch' gradient descent), in mini-batches (called 'mini-batch stochastic gradient descent') or one sample at a time (called 'stochastic gradient descent', terminology varies): https://ruder.io/optimizing-gradient-descent/

jedishrfu
Mentor
I would test your numpy assumption on random weights or see how they are used in calculation perhaps you're doing something to make them "predictable".

I thought that doing the gradient over all of the data or batches of the data made more sense. The Wikipedia article was confusing me. I guess it tries to simplify things by talking about a single vector, but that just totally confused me.

The output was the same no matter what the networks were with 100 neurons and 1 hidden layer. The weights kept changing, but the output would still be the same, for some reason. That's why the loss was a constant over all random weights.

This is an output I am getting after training with 30 neurons and 1 hidden layer, which doesn't make sense. For some reason every output in the output matrix is exactly the same. It makes no sense, but I think I can say this is probably not due to a mathematical error.
Python:
>>>FFNN.forwardpropagation( X )
array([[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14],
[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14],
[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14],
...,
[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14],
[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14],
[1.57180290e-20, 9.99999881e-01, 5.68244953e-23, ...,
3.26637459e-14, 8.78497753e-74, 2.42167327e-14]])

Last edited:
pbuk
Gold Member
Write comments in your code: for an algorithm like this I would expect more lines of comment than there are lines of code.

Except in trivial loops where i, j and k are ok, make your indices look like indices. If you are frequently traversing over a fixed number of elements, set that number as a 'constant' at the beginning of the function: so instead of
Python:
for hidden_layer in range( len( self.network ) - 1, 0, -1 ) :
write something like
Python:
# Set some reusable loop bounds.
last_network_layer = len(self.network) - 1
...
# Traverse the hidden layers top down.
for hidden_layer_index in range( last_network_layer, 0, -1 ) :

Remember Python lists are indexed from 0 to len(myList) - 1.

Make use of infix operators where relevant, and be clear and consistent with your variable names (grad_b seems to be the derivative of the bias, but what is grad_w?): so instead of
Python:
self.network[ hidden_layer ] = self.network[ hidden_layer ] - r*grad_w
self.bias[ hidden_layer ] = self.bias[ hidden_layer ] - r*grad_b
write something like
Python:
# Reduce the hidden layer's network and bias values by their scaled derivatives.
self.network[ hidden_layer_index ] -= step_size * grad_network
self.bias[ hidden_layer_index ] -= step_size * grad_bias

Last edited:
atyy, berkeman and Zap
kith
pbuk and atyy
He seems to be applying the gradient descent row-wise, or treating ##\delta## as a vector instead of a matrix, which I don't understand. He used batches, yet, splits each batch into vectors, so it doesn't make any sense to me.

Summary:: I've no idea why it's not working ...

Hey, guys.

So, I've developed a basic multilayered, feedforward neural network from scratch in Python. However, I cannot for the life of me figure out why it is still not working. I've double checked the math like ten times, and the actual code is pretty simple. So, I have absolutely no idea where I am going wrong.

You can see my neural net on my github pages website if you follow the link below:

The web page is very neat and written like a tutorial. However, it's not a very good tutorial, because the program doesn't work.

I just cannot for the life of me figure out why. If you are interested, you may peruse that web page and maybe tell me where I'm going wrong. I will appreciate it. If you are not interested, that is okay, too.

Thanks.

I recommend changing your color scheme.

One thing that I noticed is that your derivative code doesn't match your math.

How might the loss function compute a total loss that is exactly the same every time you train the network? Shouldn't that be impossible, since the weights are initialized randomly? I am not specifying any seed for the random weights, and it would be unlikely that Numpy would seed the random variables exactly the same multiple times.

If you're not seeding the random number generator, then I would expect it to be exactly the same weights every time. Seeding the random number generator (with a different number each time you run the program, e.g. with the time) is how you would get different ones.

You might have a bunch of simple low level mistakes here and there. I recommend to trace through your code, checking that every line of your code is doing what you expect it to do. If your output is not even changing, then you must have something wrong. Is it entering your loops? When it is supposed to be updating the outputs, what is happening?

Also, I don't know what your inputs and target outputs are.

No need to stress over a possible math error, when you probably have some simple coding errors anyway.

Last edited:
I recommend changing your color scheme.

My whole reason for doing this is (1) to understand a basic feed forward neural network entirely and (2) to make a multi-part instructional video to post on YouTube to spruce up my online portfolio and potentially make some money. So, I will be screen recording the tutorial while using Google Dark Reader extension, hence the strange color scheme. However, I can't seem to get my code to work!!! I was so close to finishing this ... I even have about 120 powerpoint slides painstakingly deriving all of the math. I put so much effort into this.

If you're not seeding the random number generator, then I would expect it to be exactly the same weights every time. Seeding the random number generator (with a different number each time you run the program, e.g. with the time) is how you would get different ones.
I believe Numpy automatically reseeds the random number generator, probably based on the current time, or something like that.

Last edited:
One thing that I noticed is that your derivative code doesn't match your math.
The math is showing the gradient applied to the ##m^{th}## example. The code is applying the gradient to the entire input matrix. At least, that's what I think it's doing.

O you meant for the activation function. I believe I made a typo in the math. Good catch.

No need to stress over a possible math error, when you probably have some simple coding errors anyway.

I'm thinking the math is most likely correct, because when I start rearranging things, the matrix multiplications fail due to the operands dimensions not matching up.

Last edited:

I made the recommended changes and tried to make it a little more readable, but I still cannot find out where there is an error. I'm thinking it must be a mathematical error, because the code is very simple, and I don't see any error in the code.

Tom.G
Try flow charting it.
The attention to detail required along with the parallel processing of your visual channel is surprisingly effective.

Python:
class FeedForwardNeuralNetwork( FeedForwardNeuralNetwork  ) :

def __forwardpropagation( self, X : Input_Matrix, A : Dict, Z : Dict ) -> Output_Matrix :
...
for layer_n in range( 1, len( self.weights ) + 1 ) :
Z[ layer_n ] = np.matmul( A[ layer_n - 1 ],  self.weights[ layer_n ]  ) + self.biases[ layer_n ]
...
It looks like you have some off by 1 errors. The loop goes over layer_n = 1 to len(self.weights). On the last iteration, when layer_n=len(weights), you are accessing self.weights[ len( self.weights ) ].

This should give an error, list index out of bounds, which makes me think that either len( self.weights )=0, so the loop isn't even entered, or __forwardpropagation is never actually called. Maybe the first loop condition in the train function, np.sqrt( totgrad ) > convergence, is never true?

Have you tried tracing the execution? What happens? Is it even going into the loops? At which point does something unexpected happen?

If you have to, input something simple, so you know what the results should look like in each operation.

Edit:

Also, you have the same off by 1 error in train, again trying to access weights[ len(weights) ], which tells us again that the code in that inner loop is never executed (because if it was, you would get an error and the program would halt).

Or maybe your program is halting with errors and you're not noticing it?

Also, why are weights[ 0 ] and biases[ 0 ] left undefined?

Last edited:
Yea. It might be a little confusing, but I'm not using lists to store the weights and biases. I'm using two dictionaries with keys that are integer values. The keys are the layer numbers. The input layer is the zeroth layer and does not have any weights or biases. Hence why there is no weights[0] or biases[0].

I decided to use a dictionary, because it's easier to populate and repopulate. Also, the math I derived is treating the input layer as ##l = 0##, and the output layer as ##l = number\ of\ hidden\ layers + 1##. So, I wanted to follow that convention.

Also, when storing the weighted input of each layer ##Z## and the output of each layer ##A##, it is easier to follow mathematically if they are dictionaries with layer number as keys. That's why I decided not to use ##i## in the for loops.

So, there are no lists in this program lol, except for the keyword argument "hidden_layer."

Last edited:
I figured out why the program outputs an array of identical vectors, as seen in post #13.

It is because the weighted input to hidden layer 1 is a matrix of large numbers, due to the dot product of a large number of features, so that the output of layer 1 is an array of nothing but 1s, because the sigmoid activation function returns approximately 1 for any number larger than like 5. The weighted input matrix to the first hidden layer has values of around 50. Then, when the next layer receives this array of 1s, it applies its weights and activation function and outputs an array of identical vectors, and so on.

Last edited: