Optimization with Newton's method

AI Thread Summary
The discussion focuses on optimizing a system of equations solved using Newton's method, specifically to minimize the sum of square residuals (SSR) with respect to parameters K1 and K2. The current approach involves a time-consuming binary search or brute force method to explore K1 and K2 values, taking about 40 minutes per experiment. Participants suggest using gradient descent for optimization, which requires calculating derivatives of the SSR function concerning K1 and K2. There is a debate on the feasibility of deriving these partial derivatives from the outputs of Newton's method. Customizing Newton's method or exploring alternative optimization techniques, such as quasi-random search methods, is also recommended to improve efficiency.
fahraynk
Messages
185
Reaction score
6
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Right now, I solve the system by Newtons method for a given k1 and k2. I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

The problem is it takes a long time, about 40 minutes per experiment and I have maybe 100 experiments to calculate. I would rather use something like gradient decent.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
 
Physics news on Phys.org
fahraynk said:
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Let's name the function and state it's arguments. Do you mean ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2## ?

Right now, I solve the system by Newtons method for a given k1 and k2.
It would be better say say that you solve the system "using " a given k1 and k2 for the values x1,x2,x3,x4, if that's what you mean.

I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

It's best to state optimization problems in the standard form: That would be "Minimize ... with respect to choices of ... subject to the contraints:,,,,,".

In your case, the problem appears to be:

Minimize ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2##
with repect to choices of ##K_1,K_2##
subject to the contraints:
##(X_1,X_2,X_3,X_4) = NM(K_1,K_2)##
where ##NM(K_1,K_2)## is the result of solving some system of equations by Newtons method for variables ##(X_1,X_2,X_3,X_4)##.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

The correct notation would use partial derivatives - even though these are a pain-in-the-neck to write in LaTex.

##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##
The problem is it takes a long time, about 40 minutes per experiment
Which step of the problem takes a long time? - using Newtons method? There are many tricks than can be used to speed up programs that solve particular sets of equations. For example, sometimes the final result of 2 steps of Newtons method can sometimes be implemented more concisely as a single sequence of code. Custom-made programs for solving particular sets of equations can be much faster than using a general-purpose equation solver.

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
There are quasi-random search methods, such as Simulated Annealing.

 
  • Like
Likes fahraynk
Stephen Tashi said:
##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##

.
Thanks for your help!

Ah! I see that I made a big mistake. I think the partial of ssr should be
$$\frac{\partial{ssr}}{\partial{k_1}}=\frac{\partial{ssr}}{\partial{x_1}}\frac{\partial{x_1}}{\partial{k_1}}+ . . . \frac{\partial{ssr}}{\partial{x_4}}\frac{\partial{x_4}}{\partial{k_1}}$$
Originally I had the partial of ##x_1## with respect to NM, which was weird like you said.

Newtons method looks like this : $$x=x_{0}-G(H_t,G_t,k_1,k_2)*J(k_1,K_2)^-1$$
G is a vector consisting of 4 equations, and ##J^-1## is the inverse of the Jacobean matrix.
The output for a particular ##x_n## for ##n \in [1,4]## is a large equation, ##x_n=fn(x_1,x_2,x_3,x_4,k_1,k_2,H_t,G_t)##. I think I could either compute the derivative of this function, a giant mess, or I can approximate the derivative by taking the value of ##x_n## from Newtons method at two values for ##k_1## close together, and divide by ##dk_1##.

You asked "Which step of the problem takes a long time?"
Newtons method takes the longest, because for whatever initial guess I input, a certain percentage of the data does not converge. I ended up making a systematic method of guessing that does every possible combination of guesses, but it is a maximum of 100 guesses worse case for a choice of K1 and k2. Plus for every k1 I have to check a ton of K2 with this method, so it just takes a long time. I could try to develop better guesses, or lower the amount of guessed guesses to 50 or so, but I would rather just use gradient decent if possible because its a cooler algorithm.
 
Hi there, im studying nanoscience at the university in Basel. Today I looked at the topic of intertial and non-inertial reference frames and the existence of fictitious forces. I understand that you call forces real in physics if they appear in interplay. Meaning that a force is real when there is the "actio" partner to the "reactio" partner. If this condition is not satisfied the force is not real. I also understand that if you specifically look at non-inertial reference frames you can...
This has been discussed many times on PF, and will likely come up again, so the video might come handy. Previous threads: https://www.physicsforums.com/threads/is-a-treadmill-incline-just-a-marketing-gimmick.937725/ https://www.physicsforums.com/threads/work-done-running-on-an-inclined-treadmill.927825/ https://www.physicsforums.com/threads/how-do-we-calculate-the-energy-we-used-to-do-something.1052162/
I have recently been really interested in the derivation of Hamiltons Principle. On my research I found that with the term ##m \cdot \frac{d}{dt} (\frac{dr}{dt} \cdot \delta r) = 0## (1) one may derivate ##\delta \int (T - V) dt = 0## (2). The derivation itself I understood quiet good, but what I don't understand is where the equation (1) came from, because in my research it was just given and not derived from anywhere. Does anybody know where (1) comes from or why from it the...

Similar threads

Back
Top