A Optimization of a nonlinear system

fahraynk · Dec 31, 2017

Can someone please tell me how to go about optimizing this system of equations? It is weird because the residuals are computed with ##A = B*X_1+C*X_2## but X_1 and X_2 are computed in a separate function ##[X_1,X_2]=f(k1,k2,H0,G0)##, and what I am optimizing is a parameter in that second function.

I have this system of equations
H G K1 = HG
H G K2 = H2G
H0 = H + HG +2H2G
G0 = G + HG + H2G

HG, H2G, HG2, H, G are variables, and H2G represents 2H and G getting stuck together forming H2G.
In the last 2 equations H0, G0 are constants that represent the total H in the system (The H distributes itself to free H, HG, and H2G, so if you look at the H0 equation you see 2*H2G because H2G contains 2 H)I can solve this system with a root finding algorithm like Newtons method if I know K1, K2 and H0,G0.
The trouble is I don't really know K1 and K2, I want to solve for them.
I have 12 experiments represented as 12 equations :

##A_0H0_2 = A_1H + A_2HG + A_3H2G## with condition ##G0=G0_2##
##A_0H0_3 = A_1H + A_2HG + A_3H2G## with condition ##G0=G0_3##
##A_0H0_4 = A_1H + A_2HG + A_3H2G## with condition ##G0=G0_4##
and 9 more equations exactly like this.

Basically, these represent different experiments. The variables (H, G, HG, H2G) should all change because H0 and G0 change, but K1 and K2 should remain the same.

How can I optimize K1 and K2 (in a least squares sense)? Is there an algorithm that can update K1 and K2 using these 12 experiments which contain values computed by functions of K1 and K2?

jambaugh · Dec 31, 2017

You can solve for K1 and K2 analytically from your first two equations... provided you know H,G, HG, and H2G. Two of these 4 can be eliminated, or rather replaced by H0 and G0 by your second two equations.

It is unclear what you mean by "optimize K1 and K2" if you're talking least square error you're estimating them from random data so as to optimize (minimize) that least square error which you need to be able to express. There will be some choices to make in terms of defining your error function based on which variable you consider to be free and which to be independent, and how much you don't care about that and want to keep the math simpler.

You speak of the 12 experiments represented by 12 equations but what exactly is the experimental data? Which variables are being observed? (And what is the meaning of the three additional quantities A1,A2, and A3 that you're throwing in there.

Before you go about throwing algorithms at the problem you should understand in principle how it would be solved analytically. It looks to me like you're trying to do a non-linear regression problem. So if (I'm making an assumption here in order to proceed) if your experimental data consists of a set of data values for your 4 main variables (H,G, HG, H2G) you can think of that as 12 random points in an abstract 4-dimensional space.

You spoke of H0 and G0 being different for each experiment so you cannot assume these points lie on the same surface defined by the 2nd pair of equations you gave (for a single choice of H0 and G0) so for this problem I would guess you can ignore those constants and stick to the relationship defined by the first two equations. I see two ways to approach this. For a specific choice of K1 and K2 you have 4-2 =2 dimensional surface in that 4-space. As you have it written giving HG and H2G as functions of H and G I would assume you'd treat H and G as independent variables and thus look at the error as the difference between the (HG,H2G) values from your data vs the values predicted by the equation for the values of K1,K2. You would then define some type of error function as:
E1^2 = Sum of (HG - K1 H G)^2
E2^2 = Sum of (H2G - K2 H G)^2
Err^2 = aE1^2+bE2^2
where you're summing over the 12 experiments. (since the two errors depend only on their own respective parameter the weights won't affect anything. The shape of the paraboloid will not affect the location of the minimum point. )
And the task is to choose K1 and K2 to minimize these errors. That's a classic multi-variable optimization problem, the derivatives of the error with respect to each K are:
dE/dK1 = a Sum of - 2 H G (HG - K1 H G)
dE/dK2 = b Sum of - 2 H G (H2G - K2 H G)
Then you set these to zero and solve...
(I'm going to divide by the sample size to convert sums to averages which I place in angle brackets: <H> = 1/12 Sum of H, etc. )
This gives the optimizing equations as:
<H G HG > = K1 <H G>
<H G H2G > = K2 <H G>
So you should average (or sum) over your 12 experiments the products P1 = H*G*HG, and P2 = H*G*H2G and P = H*G then your least square error will be obtained from:
K1 = <P1>/<P> = Sum of P1 / Sum of P
and
K2 = <P2>/<P> = Sum of P2 / Sum of P.

That is the direct non-linear regression using the form of your equations with my assumption about dependent vs independent. You can also work it reversing the dependent and independent pairs or using a wilder combination. You can also ignore dependence and minimize the direct distance from the points to the regression curve (a much nastier optimization problem.) There may also be, in your application, a reason to account for the 2nd two equations which you could bring into the mix by using them to eliminate two of your first 4 variables. I don't think this will have any effect on the answer (subject to keeping the same choice of independent variables) given they have a linear relationship.

I hope that was helpful

fahraynk · Dec 31, 2017

jambaugh said:

You can solve for K1 and K2 analytically from your first two equations... provided you know H,G, HG, and H2G. Two of these 4 can be eliminated, or rather replaced by H0 and G0 by your second two equations.

It is unclear what you mean by "optimize K1 and K2" if you're talking least square error you're estimating them from random data so as to optimize (minimize) that least square error which you need to be able to express. There will be some choices to make in terms of defining your error function based on which variable you consider to be free and which to be independent, and how much you don't care about that and want to keep the math simpler.

You speak of the 12 experiments represented by 12 equations but what exactly is the experimental data? Which variables are being observed? (And what is the meaning of the three additional quantities A1,A2, and A3 that you're throwing in there.

Before you go about throwing algorithms at the problem you should understand in principle how it would be solved analytically. It looks to me like you're trying to do a non-linear regression problem. So if (I'm making an assumption here in order to proceed) if your experimental data consists of a set of data values for your 4 main variables (H,G, HG, H2G) you can think of that as 12 random points in an abstract 4-dimensional space.

You spoke of H0 and G0 being different for each experiment so you cannot assume these points lie on the same surface defined by the 2nd pair of equations you gave (for a single choice of H0 and G0) so for this problem I would guess you can ignore those constants and stick to the relationship defined by the first two equations. I see two ways to approach this. For a specific choice of K1 and K2 you have 4-2 =2 dimensional surface in that 4-space. As you have it written giving HG and H2G as functions of H and G I would assume you'd treat H and G as independent variables and thus look at the error as the difference between the (HG,H2G) values from your data vs the values predicted by the equation for the values of K1,K2. You would then define some type of error function as:
E1^2 = Sum of (HG - K1 H G)^2
E2^2 = Sum of (H2G - K2 H G)^2
Err^2 = aE1^2+bE2^2
where you're summing over the 12 experiments. (since the two errors depend only on their own respective parameter the weights won't affect anything. The shape of the paraboloid will not affect the location of the minimum point. )
And the task is to choose K1 and K2 to minimize these errors. That's a classic multi-variable optimization problem, the derivatives of the error with respect to each K are:
dE/dK1 = a Sum of - 2 H G (HG - K1 H G)
dE/dK2 = b Sum of - 2 H G (H2G - K2 H G)
Then you set these to zero and solve...
(I'm going to divide by the sample size to convert sums to averages which I place in angle brackets: <H> = 1/12 Sum of H, etc. )
This gives the optimizing equations as:
<H G HG > = K1 <H G>
<H G H2G > = K2 <H G>
So you should average (or sum) over your 12 experiments the products P1 = H*G*HG, and P2 = H*G*H2G and P = H*G then your least square error will be obtained from:
K1 = <P1>/<P> = Sum of P1 / Sum of P
and
K2 = <P2>/<P> = Sum of P2 / Sum of P.

That is the direct non-linear regression using the form of your equations with my assumption about dependent vs independent. You can also work it reversing the dependent and independent pairs or using a wilder combination. You can also ignore dependence and minimize the direct distance from the points to the regression curve (a much nastier optimization problem.) There may also be, in your application, a reason to account for the 2nd two equations which you could bring into the mix by using them to eliminate two of your first 4 variables. I don't think this will have any effect on the answer (subject to keeping the same choice of independent variables) given they have a linear relationship.

I hope that was helpful

Firstly, thank you so much for your help.

You wrote this term to calculate the error : E1^2 = Sum of (HG - K1 H G)^2
This assumes I know HG. I don't know H, G, H2H or HG in any of the experiments actually. I only know H0, G0 and these constants which I labeled ##A_1,A_2,A_3##.
The constants ##A_1, A_2, A_3## are the same across all 12 experiments, but the values for ##A_0,H_0,G_0## change

Also very sorry, I forgot one term in the experimental equations, I forgot to add the ##A_0## term, so I edited the experimental equations above. they are really all of the form $$A_{0m}H_{0m}=A_1H+A_2HG+A_3H_2G\\\\m\in[1,12]\\\\G_0=G_{0m}$$ I had originally left out the ##A_0## term.

So... If I was going to do what you said, I would have something like this for the error (E):
$$E^2=\sum_{m=0}^{12} ( A_{0m}H_{0m}-A_1H-A_2HG-A_3H_2G)^2 $$
or, solving for HG and H2G in terms of K(H)(G), and H in terms of H0, HG, H2G :
$$
E^2 = \sum_{m=0}^{12} A_{0m}H_{0m}-A_1k_1(H)(G)-A_2k_2(H)(G)-A_3(k_1(H)(G)+2k_2(H)(G)-H_{0m})^2
$$

So to calculate this residual sum:
Step 1 guess ##k_1## and ##K_2##.
Step 2 : solve for H, G, HG, H2G given ##k_1,K_2,H_{0m},G_{0m}##
Step 3 : plug all of those numbers into this residual calculation to get the error.
Step 4 : take the derivative of residual with respect to K1 and K2. Set dE^2/dK1 = 0 and dE^2/dK2 = 0. Solve for K1, K2 such that these both equal 0.
Step 5 : repeat until residual error does not noticeably change anymore.

I am a bit confused about step 4, because if I guess K1 and K2, then the residual will be a constant, and the derivative of a constant is 0. But.. if I calculate the residual with my computed values for H, G, HG, H2G given ##k_1,K_2,H_{0m},G_{0m}##, then take the derivative with respect to K1 and K2 pretending these are unknowns again... then maybe I can solve this equation so K1 and K2 = 0?

jambaugh · Dec 31, 2017

Step 4 should give you a specific value which minimizes the square error. It is not an iterative process but a direct solution so there should be no step 5.

Beyond that it sounds like you have the gist of the method.

Knowing the A's and knowing ##H_0## and ##G_0## you can use your 2nd and 3rd equations with the one from the experiment to elimine three of the four original variables (expressing them in terms of ##H_0,G_0## and one of the four. But you still are light one value from the experimental data.

Adding it up this way you have (not counting the equations involving K) 3 equations (including the one associated with the experiments) and 6 variables (including ##G_0,H_0##). You thus have 3 independent variables and need 3 dimensional data points. Knowing only ##H_0## and ##G_0## per experiment is not enough.

What you can do is also pick one of ##G,H,GH,G2H## as an additonal "parameter" and solve for all 3 which minimizes your sum squared error.
That doesn't sound quite right though, very asymmetric. But you would there express the squared error as a function of the three parameters, say ##K_1, K_2,## and ##H## and then set the three partial derivatives to zero and solve.

fahraynk · Jan 2, 2018

jambaugh said:

Step 4 should give you a specific value which minimizes the square error. It is not an iterative process but a direct solution so there should be no step 5.

Beyond that it sounds like you have the gist of the method.

Knowing the A's and knowing ##H_0## and ##G_0## you can use your 2nd and 3rd equations with the one from the experiment to elimine three of the four original variables (expressing them in terms of ##H_0,G_0## and one of the four. But you still are light one value from the experimental data.

Adding it up this way you have (not counting the equations involving K) 3 equations (including the one associated with the experiments) and 6 variables (including ##G_0,H_0##). You thus have 3 independent variables and need 3 dimensional data points. Knowing only ##H_0## and ##G_0## per experiment is not enough.

What you can do is also pick one of ##G,H,GH,G2H## as an additonal "parameter" and solve for all 3 which minimizes your sum squared error.
That doesn't sound quite right though, very asymmetric. But you would there express the squared error as a function of the three parameters, say ##K_1, K_2,## and ##H## and then set the three partial derivatives to zero and solve.

Can you please tell me what you mean when you said to pick G, H, HG, or H2G as an additional parameter? Do you mean I need to know the value of one of them to solve the system? I am not 100% sure, but I think this is an undetermined system, even if I use the 12 experiments as additional equations.

Does this mean if I can't find an additional condition then there will be infinite solutions for K1 and K2 to minimize the residual?

I was thinking to put all of 4 equations into Newtons method :
H G K1 = HG
H *G K2 = H2G
H_0 = H + HG +2H2G
G_0 = G + HG + H2G

And also the 12 experiments of the form :
$$
A_0H_{02}=A_1H+A_2HG+A_3H_2G $$

I think Newtons method could give me an answer to minimize the error, but I don't know if it is the only answer...

jambaugh · Jan 20, 2018

Sorry I was away from the forum for so long. I am still not clear on whay you mean by "The experiments are of the form: [equation]."
An experiment should have a procedure which, when repeated, will yield a number of values either of a single variable or of a group of variables.

Which of your variables are the experimental outcomes?

A Optimization of a nonlinear system

Thread 'What Exactly is Dirac’s Delta Function? - Insight'

Thread 'Fermat's Last Theorem'

Thread 'Useless continued fraction for 1'

Similar threads

Hot Threads

Insights Why Vector Spaces Explain The World: A Historical Perspective

Insights Fermat's Last Theorem

B How is it that law of sines does not work in this exercise?

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective