Python Neural Network training BUT testing fails

AI Thread Summary
The discussion focuses on a neural network (NN) training issue where the NN fails to produce expected outputs for specific test data after training on a limited dataset. The training data consists of triplets, where the first two numbers determine the output, while the third number is deemed irrelevant. Despite training, the NN incorrectly returns 1 for the input [000], suggesting it may have learned an incorrect pattern from the training data. The conversation highlights the challenge of training NNs on small datasets with fixed outputs, as multiple hypotheses can fit the data, leading to unpredictable results. The NN's inability to generalize from the training data to the test data is a key concern.
ChrisVer
Science Advisor
Messages
3,372
Reaction score
465
Hi, I was trying to work on a basic NN training problem I found online. The pattern of data to train are the triplets:
[101]->1 , [011]->1, [001]->0 , [111]->0 , [100]->1
Meaning that the 3rd number brings no info, while for the first 2 if they are identical it should evaluate to 0 else to 1. The last data is to also let it learn what happens when the 3rd argument is 0.

I train on these data and then I am trying to apply what the NN learned on testing data:
[000], [010], [110]
Unfortunately I get 1 back... I assume that the problem is that it somehow learns that if the 3rd argument is 0, it should evaluate to 1 by the last training data point. When I changed the last ->1 to ->0, the tested data also evaluated to 0.
So I migrate one more data point in the training, [110]->0.
and test on: [000] , [010] ... In the latter case, both evaluate to 1. I don't understand why it fails so miserably at 000... any idea?

The code is given below (it's the based on the basic NN code given in many online tutorials but written as a class)...
Python:
import numpy as np
import matplotlib.pyplot as plt
class NN:
    def __init__(self, input_data , output_data=None, testing=None ):
        self.input_data = input_data
        self.testing = testing

        if not testing:
            self.output_data = output_data
        # randomly initialize weights
        np.random.seed(1)
        if testing:
            self.weights0 = testing[0]
            self.weights1 = testing[1]
        else:
            self.weights0 = 2*np.random.random([self.input_data.shape[1], self.input_data.shape[0] ]) -1.
            self.weights1 = 2*np.random.random([self.input_data.shape[0], 1])

        self.errros = None
        self.xerros = []

        self.results = {
                        "OutputTrain": None,
                        "Matrix0": None,
                        "Matrix1": None
                        }

    def sigmoid(self, x, derivative=False):
        if derivative: return x*(1-x)
        return 1/(1+np.exp(-x))

    def train(self, Nsteps , test= True):
        self.errors = np.zeros( (self.input_data.shape[0], Nsteps) )
        for step in range(Nsteps):
            l0 = self.input_data
            l1 = self.sigmoid( np.dot(l0, self.weights0 ) ) #(4x3) x (3x4) = 4x4 matrix
            l2 = self.sigmoid( np.dot(l1, self.weights1 ) ) #(4x4) x (4x1) = 4x1 matrix (output)

            if not self.testing:
                l2_err = self.output_data - l2
                delta_l2 = l2_err * self.sigmoid( l2 , derivative=True )
                l1_err = delta_l2.dot(self.weights1.T)
                delta_l1 = l1_err * self.sigmoid( l1 , derivative=True )

                for i in range(self.output_data.shape[0]):
                    self.errors[i][step] = l2_err[i]
                self.xerros.append(step)

                self.weights1 += l1.T.dot(delta_l2)
                self.weights0 += l0.T.dot(delta_l1)
        self.results["OutputTrain"]=l2
        self.results["Matrix1"] = self.weights1
        self.results["Matrix0"] = self.weights0    def summary(self):
        print("Training Results : ")
        print("\t Output data (trained) : ")
        print(self.results["OutputTrain"])
        print("\t Matrix 1 : ")
        print(self.results["Matrix0"])
        print("\t Matrix 2 : ")
        print(self.results["Matrix1"])    def plot(self):
        x = np.array(self.xerros)
        cols = {0:'black',1:'r',2:'g',3:'b',4:'m',5:'y'}
        for i in range(self.output_data.shape[0]):

            plt.plot(x, np.array(self.errors[i]), cols[i], label="Entry %s"%i)
            legend = plt.legend(loc='upper right', shadow=True, fontsize='x-large', framealpha=0.05)
            axes = plt.gca()
            axes.set_ylim([-2., 2.])
            #legend.get_frame().set_facecolor('#00FFCC')
        plt.title("Error vs Ntrial")
        plt.show()

if __name__=="__main__":
    x = np.array([[0, 0, 1],
                  [0, 1, 1],
                  [1, 0, 0],
                  [1, 1, 0],
                  [1, 0, 1],
                  [1, 1, 1]])

    # output
    y = np.array([[0],
                  [1],
                  [1],
                  [0],
                  [1],
                  [0]])

    NeuralNetwork = NN(x,y)
    NeuralNetwork.train(25000)
    #NeuralNetwork.plot()
    #NeuralNetwork.summary()

    print("Test Data")
    data_test = np.array([[0,0,0],[0,1,0]])#,[1,1,0]])
    #isOne = (data_test.dot(NeuralNetwork.results["Matrix0"])).dot(NeuralNetwork.results["Matrix1"])
    #isOne = data_test.dot(NeuralNetwork.results["Matrix0"])
    #isOne = isOne.dot(NeuralNetwork.results["Matrix1"])
    #print(isOne)
    tester = NN(data_test, testing = [NeuralNetwork.results["Matrix0"], NeuralNetwork.results["Matrix1"]])
    tester.train(25000)
    #tester.summary()
    print(tester.results["OutputTrain"])
 
Technology news on Phys.org
Your training dataset is compatible with the hypotheses "(A XOR B) OR NOT C" in the original case and "(A XOR B) AND NOT C" in the modified case, and probably something similar in the third case.
How is the NN supposed to know that you didn't want these, but "A XOR B"?

In general training a neural net on such a small input space and with fixed results for each possible input rarely works because there are always multiple possible hypotheses.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Replies
2
Views
2K
Replies
1
Views
4K
Replies
15
Views
2K
Replies
16
Views
2K
Replies
3
Views
2K
Replies
16
Views
3K
Back
Top