Convexivity issues in a new type of network - NLP

moyo · Jun 1, 2018

Hallo All!

Im glad to be here...

I have written a paper on a new type of network for NLP but i have lingering questions on how efficient and effective it is...this is a link to it on my blog...

ABSTRACT submited

So the following is a more formal framing of efficiency and effectiveness in this network that i would appreciate input on.

i have a type of network...it has one output node connected to all 4 hidden layer nodes, which are connected to the 12 input layer nodes the following way.Each is connected to exactly 9 nodes and the set of all sets of 3 nodes not being used by each node in the hidden layer consist of disjoint sets...this just means that we can't have any 2 nodes connected to the exact same nine nodes...or in general , we can't have any two nodes share missing nodes between them...
if this is clear so far there is another condition...the inputs of these input nodes is a number and each node in the hidden layer simply sums all the numbers from the input nodes it is connected to...

*This is not a neural network*...all a node in one layer does is collect a number from the nodes it is connected to , sums them and then passes them on to the next layer...no bias , no weights and no activation function...its really simple (for now)lol...now we could input random numbers into the input layer and compute a value at the end...so just imagine if we had an expected output number for a particular input of numbers and were to use this to calculate some form of loss...but we don't really have variables to adjust...

so we will say the variables we want to adjust ARE the input numbers and we want to adjust in the direction that qulaifies the out put for a certain forward pass according to some criteria that determines what we expect...so say we expect the number 10...and we get 5 with a random input of numbers then we adjust the variables (numbers) by back propagation with GD till they produce the number 10 at the output...

...my first question is , is this a convex optimisation problem and how can we tell that the global minimum error is zero?
Another question is , if we use a complex expression to calculate the value of a node , like for each node in the hidden layer receiving an input from 9 nodes we scale the number by multiplying it with the value of the three nodes left out of the 12.
so then it becomes a fully connected layer and in some connections the value of the connected node (3 of them for each node in the hidden layer )multiplies instead of adds to the value that the other nodes have...will it still be a convex function...

note that there is an interplay of features at play , each number in the input contributes to the output by being added in some nodes and multiplying the value of others...

if you have come this far ...could we have it that one of the three numbers is cubed , the other squred and last left alone of the three before multiplying...
would it still be convex..or get even more convoluted?
how can we generalise the results of what changes to the system takes us further from a convex loss function than others...
Ultimately

======
is there gauranteed one shared function, for each node in the hidden layers computation, that will give us a convex loss function. And what would be the general form of such.
=====

Part two of this
===

We connect 100 parralel networks of this type we will not attach their output nodes to another.We will have 100 output nodes. We also do not merge the hidden layer units of the 100 but we are choosing to keep a note of which nodes belonged to the hidden layer of each of the 100 networks.Also Each of those 100 have a number of nodes in their hidden layer between 1 and 10.

We need to keep this note in memory because we will delete ALL the input neurons and set it up again as follows , So these 100 networks are connected only because they are connected to samples of the input layer nodes, with each sample set intersecting with many others.
To clarrify:
We select a random selection of 20 nodes across the hidden layers partitions of ALL 100 networks and label them "A" then we omit them from the next procedure which is to perform this process again with the remaining nodes, and label them "B"...then we omit all A's and B's from the next procedure and repeat the process till we have none left to label.

Now we will connect each "A,B,C..." nodes to the input layer in the following way

Remember at first we remove all the input layer nodes, and populate the input layer as follows
======
create 3 nodes and place them in the input layer

connect ALL "A"'s to those 3 nodes...(these will be the 3 nodes we will use to modify the sum from the other by multiplying instead of adding).

create ANOTHER 3 nodes and place them in the input layer

Connect ALL "B"'s to those three nodes

repeat for all A ,B,C...

=======

Then:
Take each of the hidden nodes from each of the 100 networks we joined together.
Remember we still remember which hidden layer nodes came from which of the 100 networks.
So we take all nodes say of type B and connect them to the input nodes of the remaining nodes in the rest of the particular set of nodes that they came with in a particular network of the hundred .

E.g. if network 2 has nodes labelled " F,O,B,S" and network 35 has nodes labeled "P,B,E,C,B,G". then we connect both nodes labeled B and connect them to the input layer nodes of all the remaining nodes, i.e. "F,O,S,P,E,C,G"

If B was also in network ,15 ..which is "Y,B" ...then we include Y in the list of remaining nodes.

We do this for All A,B,C...nodes. Note B's are not in the list of all remaining nodes by definition so there is no connecting all B's in each network together.

======We have now completed the network. :)
=============
*pat on back*

so if we randomly assign an expected value of either 0 or 1 to each of the 100 output nodes then we backpropagate across each separately their loss..

This would be the equivalent of classifying each of 100 sentences as either spam or not spam...Of course there is no pattern in how the A,B,C... are distributed, but if this still works , then it means each word A,B,C... will have a range of nuances that are being captured in this system, and for this example we are installing into the system the definition of spam and non spam...i.e we are forcing the words to follow a pattern. if we get a nother random sample of words, and classify them as say spam , then they will be fitting this definition...of course really the definition is NOT spam and non spam , but 0 and 1 ness. Meaning that we could choose to deliberately use a system with real words set up like this and arrange for the spam examples to have an expected output of 0 and the non spam ones to be 1.

The system upon convergence , with enogh examples will have captured the relationships between words that belong to a spam sentence by adjusting the input layers variables.

If we store those variables we can use them on a random sentence to see if its spam or not.

Now for the ultimate questions...(why i put you through this)

Does this network pass all the requirements for convexity and a global minimum with a value of zero?

For any expected value

And what type of computation involving the 3 nodes connected to each A,B.. and their complement should we use to ensure we do infact pas these two requirements.

lofgran · Jun 1, 2018

Thank you for sharing your paper on the new type of network for NLP. I am intrigued by your approach and would like to offer some input and suggestions for further development.

Firstly, I would like to clarify that this network does not seem to be a neural network in the traditional sense. It appears to be a simple computational model where each node in the hidden layer sums the values from its connected input nodes and passes it on to the next layer. This is an interesting approach and could potentially have its own advantages and limitations.

In terms of efficiency and effectiveness, it would be helpful to have more information on the specific application and tasks that this network is being used for. It would also be beneficial to have some comparative studies with other existing networks and their performance on similar tasks.

Moving on to your questions, it is difficult to determine whether this is a convex optimization problem without knowing the specific criteria for determining the expected output. However, it is possible to analyze the convexity of the loss function based on the given information. The use of complex expressions to calculate the value of a node could potentially make the function non-convex, especially if the multiplication of values is involved. It would be helpful to provide more details on the specific computation and its effects on the convexity of the function.

In terms of generalizing the results and determining which changes take us further from a convex loss function, it would be helpful to have a clear understanding of the variables and their effects on the overall network. It would also be beneficial to have a larger sample size and more diverse data to improve the generalizability of the results.

In regards to the second part of your post, I appreciate the effort put into creating a complex network by connecting 100 parallel networks. However, it would be helpful to have a clear explanation of the purpose and goals of this network, as well as the specific tasks it is being used for. Additionally, I would suggest conducting comparative studies to evaluate the performance of this network against other existing networks.

In conclusion, I believe that your network has potential and could be further developed with more detailed information and comparative studies. I hope my input and suggestions are helpful in your research and I look forward to seeing the progress of your work.

Convexivity issues in a new type of network - NLP

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

How useful is this if I want to begin programming?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight