Optimization with Newton's method

fahraynk · Feb 1, 2018

I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Right now, I solve the system by Newtons method for a given k1 and k2. I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

The problem is it takes a long time, about 40 minutes per experiment and I have maybe 100 experiments to calculate. I would rather use something like gradient decent.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?

Stephen Tashi · Feb 3, 2018

fahraynk said:

I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Let's name the function and state it's arguments. Do you mean ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2## ?

Right now, I solve the system by Newtons method for a given k1 and k2.

It would be better say say that you solve the system "using " a given k1 and k2 for the values x1,x2,x3,x4, if that's what you mean.

I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

It's best to state optimization problems in the standard form: That would be "Minimize ... with respect to choices of ... subject to the contraints:,,,,,".

In your case, the problem appears to be:

Minimize ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2##
with repect to choices of ##K_1,K_2##
subject to the contraints:
##(X_1,X_2,X_3,X_4) = NM(K_1,K_2)##
where ##NM(K_1,K_2)## is the result of solving some system of equations by Newtons method for variables ##(X_1,X_2,X_3,X_4)##.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

The correct notation would use partial derivatives - even though these are a pain-in-the-neck to write in LaTex.

##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##

The problem is it takes a long time, about 40 minutes per experiment

Which step of the problem takes a long time? - using Newtons method? There are many tricks than can be used to speed up programs that solve particular sets of equations. For example, sometimes the final result of 2 steps of Newtons method can sometimes be implemented more concisely as a single sequence of code. Custom-made programs for solving particular sets of equations can be much faster than using a general-purpose equation solver.

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?

There are quasi-random search methods, such as Simulated Annealing.

fahraynk · Feb 3, 2018

Stephen Tashi said:

##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##

.

Thanks for your help!

Ah! I see that I made a big mistake. I think the partial of ssr should be
$$\frac{\partial{ssr}}{\partial{k_1}}=\frac{\partial{ssr}}{\partial{x_1}}\frac{\partial{x_1}}{\partial{k_1}}+ . . . \frac{\partial{ssr}}{\partial{x_4}}\frac{\partial{x_4}}{\partial{k_1}}$$
Originally I had the partial of ##x_1## with respect to NM, which was weird like you said.

Newtons method looks like this : $$x=x_{0}-G(H_t,G_t,k_1,k_2)*J(k_1,K_2)^-1$$
G is a vector consisting of 4 equations, and ##J^-1## is the inverse of the Jacobean matrix.
The output for a particular ##x_n## for ##n \in [1,4]## is a large equation, ##x_n=fn(x_1,x_2,x_3,x_4,k_1,k_2,H_t,G_t)##. I think I could either compute the derivative of this function, a giant mess, or I can approximate the derivative by taking the value of ##x_n## from Newtons method at two values for ##k_1## close together, and divide by ##dk_1##.

You asked "Which step of the problem takes a long time?"
Newtons method takes the longest, because for whatever initial guess I input, a certain percentage of the data does not converge. I ended up making a systematic method of guessing that does every possible combination of guesses, but it is a maximum of 100 guesses worse case for a choice of K1 and k2. Plus for every k1 I have to check a ton of K2 with this method, so it just takes a long time. I could try to develop better guesses, or lower the amount of guessed guesses to 50 or so, but I would rather just use gradient decent if possible because its a cooler algorithm.

Stephen Tashi · Feb 5, 2018

fahraynk said:

Newtons method takes the longest,

If you are a skilled programmer, it would be worthwhile to study how Newton's method or some other method can be customized for the particular equations you need to solve.

The equations in the thread https://www.physicsforums.com/threads/optimization-of-a-nonlinear-system.935806/#post-5913743 aren't clearly defined yet.

Optimization with Newton's method

Thread '1:1 versus 2:1 Mechanical Advantage debate'

Thread 'Newton's first law?'

Thread 'Beam on an inclined plane'

Similar threads

Hot Threads

I Explain Bernoulli at the molecular level?

B Simple mass/scale puzzle

I How deep does it need to go for a physics theory to be consistent?

I Solving a momentum problem where masses change after the collision

B Torque Calculation - Not a basic application

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective