Optimization with Newton's method

Click For Summary

Discussion Overview

The discussion centers around optimizing a system of equations solved using Newton's method, specifically focusing on the sum of square residuals (SSR) as a performance metric. Participants explore alternatives to brute force and binary search for optimization, including gradient descent and other methods.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes using Newton's method to solve a system of equations for given inputs K1 and K2, resulting in outputs x1, x2, x3, x4.
  • Another participant suggests a more precise formulation of the optimization problem, emphasizing the need to state it in standard form.
  • There is discussion about the need to compute derivatives of the SSR function with respect to K1 and K2 for gradient descent, with some participants questioning the feasibility of certain derivative expressions.
  • One participant mentions that Newton's method takes a long time to compute, suggesting that custom programming might speed up the process.
  • Alternative optimization methods, such as quasi-random search methods like Simulated Annealing, are proposed by participants.
  • A participant acknowledges a mistake in their earlier notation regarding derivatives and clarifies their understanding of the partial derivatives involved.
  • Concerns are raised about the convergence of Newton's method and the efficiency of guessing initial values for K1 and K2.

Areas of Agreement / Disagreement

Participants express various viewpoints on the optimization problem, with no clear consensus on the best approach. Disagreements arise regarding the feasibility of certain derivative calculations and the efficiency of different optimization methods.

Contextual Notes

Limitations include unresolved mathematical steps related to the derivatives of the SSR function and the specific forms of the equations involved in Newton's method. The discussion also highlights the dependency on the definitions and formulations provided by participants.

Who May Find This Useful

Readers interested in numerical optimization, particularly in the context of solving nonlinear systems of equations, may find the insights and proposed methods relevant.

fahraynk
Messages
185
Reaction score
5
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Right now, I solve the system by Newtons method for a given k1 and k2. I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

The problem is it takes a long time, about 40 minutes per experiment and I have maybe 100 experiments to calculate. I would rather use something like gradient decent.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
 
Physics news on Phys.org
fahraynk said:
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Let's name the function and state it's arguments. Do you mean ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2## ?

Right now, I solve the system by Newtons method for a given k1 and k2.
It would be better say say that you solve the system "using " a given k1 and k2 for the values x1,x2,x3,x4, if that's what you mean.

I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

It's best to state optimization problems in the standard form: That would be "Minimize ... with respect to choices of ... subject to the contraints:,,,,,".

In your case, the problem appears to be:

Minimize ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2##
with repect to choices of ##K_1,K_2##
subject to the contraints:
##(X_1,X_2,X_3,X_4) = NM(K_1,K_2)##
where ##NM(K_1,K_2)## is the result of solving some system of equations by Newtons method for variables ##(X_1,X_2,X_3,X_4)##.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

The correct notation would use partial derivatives - even though these are a pain-in-the-neck to write in LaTex.

##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##
The problem is it takes a long time, about 40 minutes per experiment
Which step of the problem takes a long time? - using Newtons method? There are many tricks than can be used to speed up programs that solve particular sets of equations. For example, sometimes the final result of 2 steps of Newtons method can sometimes be implemented more concisely as a single sequence of code. Custom-made programs for solving particular sets of equations can be much faster than using a general-purpose equation solver.

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
There are quasi-random search methods, such as Simulated Annealing.

 
  • Like
Likes   Reactions: fahraynk
Stephen Tashi said:
##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##

.
Thanks for your help!

Ah! I see that I made a big mistake. I think the partial of ssr should be
$$\frac{\partial{ssr}}{\partial{k_1}}=\frac{\partial{ssr}}{\partial{x_1}}\frac{\partial{x_1}}{\partial{k_1}}+ . . . \frac{\partial{ssr}}{\partial{x_4}}\frac{\partial{x_4}}{\partial{k_1}}$$
Originally I had the partial of ##x_1## with respect to NM, which was weird like you said.

Newtons method looks like this : $$x=x_{0}-G(H_t,G_t,k_1,k_2)*J(k_1,K_2)^-1$$
G is a vector consisting of 4 equations, and ##J^-1## is the inverse of the Jacobean matrix.
The output for a particular ##x_n## for ##n \in [1,4]## is a large equation, ##x_n=fn(x_1,x_2,x_3,x_4,k_1,k_2,H_t,G_t)##. I think I could either compute the derivative of this function, a giant mess, or I can approximate the derivative by taking the value of ##x_n## from Newtons method at two values for ##k_1## close together, and divide by ##dk_1##.

You asked "Which step of the problem takes a long time?"
Newtons method takes the longest, because for whatever initial guess I input, a certain percentage of the data does not converge. I ended up making a systematic method of guessing that does every possible combination of guesses, but it is a maximum of 100 guesses worse case for a choice of K1 and k2. Plus for every k1 I have to check a ton of K2 with this method, so it just takes a long time. I could try to develop better guesses, or lower the amount of guessed guesses to 50 or so, but I would rather just use gradient decent if possible because its a cooler algorithm.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K