Solving Equation 1 with Secant Method

a.mlw.walker · Sep 18, 2009

So I have studied some of the Numberical methods such as Newton Raphson etc, but I want to try and solve this particular problem using the secant method. The problem is something I am reading in an article, and I am trying to follow the maths through.

I am slightly wondering though how it is done.

Equation 1 (attached) is the equation I start with. The data sets I have are 50 values for t(k.2[tex]\Pi[/tex]) I also know C₀, but I am trying to find estimates for a and b.

Please can someone explain how this may be done.

The article explains fitting the equation to a curve, and states that a, b and C0 in equation 2 (attached) appear non-linearly

It then goes on to say, that once the curve has been fitted after minimizing equation 2 becomes equation 3 (attached).

Does anyone understand what this means, and what I would need to do to solve equation 1 with the secant method for a and b.

Thank you

hotvette · Sep 21, 2009

I can explain equation 2. If I have a series of data points (x _i, y _i) that I want to fit with a function y = f(x,a,b) in a least squares sense, the task is to minimize the sum of squares of the the vertical distance between each data point and the curve (ie, y - f). Thus, I'm trying to minimize:

[tex] S(a,b) = \sum \big( y_i - f(x_i,a,b) \big)^2 [/tex]

I have no idea on equation 3. Question - can you post the article or the relevant section?

hotvette · Sep 21, 2009

If you want to know the steps for solving nonlinear least squares problems, it is summarized below:

1. Linearize the function [itex] f(x_i,a,b) [/itex] at intial guesses for [itex]a = a_0[/itex] and [itex]b = b_0[/itex] using the first two terms of the Taylor series:

[tex] f \approx f_0 + f_a \delta a + f_b \delta b [/tex]

where [itex] f_a = \partial f / \partial a [/itex] and [itex] f_b = \partial f / \partial b [/itex]

2. Form the least squares problem using the linearized function:

[tex] F =\sum ( f_0 + f_a \delta a + f_b \delta b - y_i )^2 [/tex]

3. To minimize F, set partial derivatives of F with respect to [itex] \delta a [/itex] and [itex] \delta b [/itex] equal to zero:

[tex] \begin{align*}
F_{\delta a} &= 0 = 2 \sum ( f_0 + f_a \delta a + f_b \delta b - y_i ) f_a \\
F_{\delta b} &= 0 = 2 \sum ( f_0 + f_a \delta a + f_b \delta b - y_i ) f_b
\end{align*} [/tex]

The next result is the following normal equation for nonlinear least squares analysis:

[tex] J^T J z = J^T (y - f) [/tex]

where [itex] z = \begin{bmatrix} \delta a & \delta b \end{bmatrix}^T [/itex] and J is the Jacobian of first partial derivatives of f.

4. Solve the linear equation from step 3 for z and update guesses for a & b:

[tex] \begin{align*}
a_1 &= a_0 + \delta a \\
b_1 &= b_0 + \delta b
\end{bmatrix} [/tex]

5. Upate J and f based on updated values for a and b

6. Repeat steps 4 & 5 until [itex] \delta a [/itex] and [itex] \delta b [/itex] are sufficiently small.

hotvette · Sep 21, 2009

The tie to Levenberg-Marquardt is as follows. The (square) matrix [itex] J^T J [/itex] can often be ill-conditioned at intial guesses of the function coefficients making the equation [itex] J^T J z = J^T (y-f) [/itex] difficult if not impossible to solve. The (simpler) version of Levenberg-Marquardt modifies the problem by adding a small number [itex] \lambda [/itex] to the diagonals of [itex] J^T J [/itex], resulting in the following modified equation:

[tex] (J^T J + \lambda I) z = J^T (y-f) [/tex]

which is more numerically stable the larger [itex] \lambda [/itex] is. The trick is to make [itex] \lambda [/itex] as small as possible to solve the problem, then gradually reduce it's magnitude to zero as the solution progresses.

One other note. The matrix [itex] J^T J [/itex] doesn't need to be computed at all if QR matrix factorization is used. Then, all that's needed is to factor [itex] J = QR [/itex] then solve the triangular system [itex] R z = Q^T (y-f) [/itex]. See link below for more explanation.

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm

The next question might be how to add [itex] \lambda [/itex] to the diagonals of [itex]J^T J[/itex] if [itex]J^T J[/itex] is never computed? Actually, it is done by augmenting J and (y-f) as follows:

[tex] J^* = \begin{bmatrix} J \\ \sqrt{\lambda} I \end{bmatrix}
\qquad \qquad
(y-f)^* = \begin{bmatrix} (y-f) \\ 0 \end{bmatrix} [/tex]

Then factor [itex] J^* = QR [/itex] and solve [itex] R z = Q^T (y-f)^* [/itex].

This is probably more than you wanted to know...

a.mlw.walker · Sep 22, 2009

So in laymans terms, can you see how I can use the Levenberg-Marquardt to solve equation 1 to find approximations of a and b, if i have lots of data for Tk?

hotvette · Sep 22, 2009

Yes, but I presume you mean ordered pairs [itex]\big( k_i, t(k_i) \big) [/itex]

a.mlw.walker · Sep 22, 2009

well if tk = equation 1, i want to solve equation 2 for a and b, is that the same?

hotvette · Sep 23, 2009

Sort of. Need to be careful of terminology: t_k is an experimentally determined data point. Equation 1 is the idealized function that is suppose to represent the data, but doesn't precisely do so because of experimental error associated with the data points. The method to estimate the parameters a & b is least squares. Equation 2 is the least squares problem to solve based on the desired fitting function (i.e. equation 1).

Thus, if you have a bunch of experimentally determined data points t_k, estimates of the parameters a & b can be found by minimizing equation 2. I hope this helps explain the situation.

One thing that seems odd is the use of k in equation 2. The index k is meant to represent the index of data points, but k is also used within the fitting function itself (i.e. equation 1). Either that's a typo or the "x" values associated with the data points is really 0,1,2...n.

a.mlw.walker · Sep 23, 2009

OK, will check the k issue, thanks for that. So what you are saying is once you have the data points for tk, i can use the least squares method to solve for a and b for non linear estimates.

So it may just be syntax, but the S(a,b,co) confuses me a bit, does that mean I am solving for a,b and co? Does normal least squares method work for finding estimates of two unknown variables?

hotvette · Sep 24, 2009

So what you are saying is once you have the data points for tk, i can use the least squares method to solve for a and b for non linear estimates.

Correct. Least squares can be used to find estimates for a and b. But, since Equation 1 is nonlinear in a & b, the nonlinear least squares algorithm needs to be used, which means an iterative solution process based on initial guesses for a & b.

So it may just be syntax, but the S(a,b,co) confuses me a bit, does that mean I am solving for a,b and co?

Hard to say w/o seeing the article, but it may be subtle terminology. Out of context I would read equation 2 as involving three unknowns. But, you said in an earlier post that you already knew c0. If that's true, I would write equation 2 as S(a,b) = ...

Does normal least squares method work for finding estimates of two unknown variables?

Least squares can be used for finding estimates for as many unknowns as you have (and the computational resourses to calculate). I've heard of problems with thousands of unknowns or more.

a.mlw.walker · Sep 25, 2009

OK, so will i end up with a couple of simultaneous equations to get each approximaiton for a and b?

hotvette · Sep 25, 2009

Correct, but it is an iterative process because the problem is nonlinear. You start with initial guesses for a & b. Then solve the least squares problem for [itex]\delta a[/itex] and [itex]\delta b[/itex] based on a linearized version of equation 1, update the values of a & b, and repeat until you have converged values for a & b. That's the way nonlinear least squares works.

I think you are at a disadvantage because you are jumping into a nonlinear least squares problem before understanding the basics of linear least squares.

a.mlw.walker · Sep 29, 2009

Hi again, sorry long delay.

Have been looking over what i have on least squares curve fitting, and i was wondering how to tell to what order my curfe fit will have to be to give an acurate representation of the data, the article i got it from used that Levenberg-Marquardt method, did that mean that they didnt need to worry about the order, it just found a good result for a and b and c0? My notes on least squares say that the final equation representing the data will be a quadratic?

hotvette · Oct 1, 2009

This is confusing. Pls explain what you mean by "order" and "quadratic". It almost sounds like you are talking about polynomial fitting functions.

Can't you post relevant pages of the article? You are asking how to interpret an article we can't see.

a.mlw.walker · Oct 2, 2009

Sorry I was confusing both of us, when I tried to math your post you wrote at Sep21-09, 02:56 PM and my notes on least squares. I will have to find a good tutorial on this Levenberg Marquardt thing now. I understand that it is doing linear regression then Newton-Gauss method, to minimise, but not too sure about the Newton-Gauss bit at mo.

a.mlw.walker · Oct 2, 2009

Wow, I never saw that tutorial at the bottom of all your posts:

3. Linear Least Squares, Non-Linear Least Squares, Optimization

Better have a read.

Solving Equation 1 with Secant Method

Attachments

What is the Secant Method?

How does the Secant Method differ from other root-finding methods?

When is the Secant Method most useful?

What is the convergence rate of the Secant Method?

How do I know if the Secant Method has converged to the root?

Similar threads

Hot Threads

Recent Insights