# Question about Least Squares Fitting

1. Aug 7, 2011

### bhr11

Hey,

I have a graph for which im supposed to fit two linear least squares line and minimize the combined residuals (the lines intersect)... I would really appreciate some info about how to do this or what this type of data analysis is called so i can google the step-by-step method.

Thanks!

Last edited: Aug 7, 2011
2. Aug 8, 2011

### hotvette

Do you know which points are suppose to go with each line, or is that part of the task?

3. Aug 8, 2011

### bhr11

Yeah, i know which points go with each line

4. Aug 8, 2011

### Ray Vickson

Your question is not 100% clear, but I will *assume* you have y = a + b*x for x< r and y = g + h*x for x > r. If r is known, and if y should be continuous at x = r, then you can write y = a + b*r + c*(x-r) for x > r, so if points 1,...,m are for x < r and m+1,...,n are for x > r, you need to minimize S = sum{(y_j - a - b*x_j)^2: j=1,...,m} + sum{(y_j - a - b*r - c*(x_j - r))^2: j=m+1,...,n}. If r is known, this is a function of a, b and c. If estimation of r itself is part of the problem, then r is also a variable in the optimization, but in that case you have a nonlinear least-squares problem because of the presence of terms b*r and c*r. However, many efficient and effective solution methods exist, including, for example, using the EXCEL Solver Tool.

Good luck.

RGV

5. Aug 8, 2011

### hotvette

In that case, it seems to me that you have two totally separate least squares problems that can be analyzed independently from one another. Formula for least squares line to a set of data points is at the bottom of the following link:

http://www.ies.co.jp/math/java/misc/least_sq/least_sq.html [Broken]

Last edited by a moderator: May 5, 2017
6. Aug 8, 2011

### bhr11

Hotvette - No that wouldn't work but thanks anyways

RGV - That method is exactly what I need. Thank you. Just wondering what's it called. Also, how would I go about minimizing S (assuming I have r) ... would I take the derivatives wrt a,b,c and then solve for a,b,c then plug them back into the sum equation. Would the excel solver tool allow me to do this?

I would really appreciate any input

Last edited: Aug 9, 2011
7. Aug 9, 2011

### Ray Vickson

If the problem is as I described in my previous response, then your suggestion can lead to an incorrect solution; I have an example where that happens (because the intersection of the two lines lies in the wrong place; that is, the point where we need to switch from formula1 to
formula2 lies inside one of
the regions, so the wrong
formula is applied at some
data points. Actually, in my
previous response I
neglected the necessary
constraint x_n <= r <=
x_{n+1} in the variable-r
case. If we omit this
constraint the solution is the
same as yours, but
sometimes this is wrong.

RGV

Last edited: Aug 9, 2011
8. Aug 9, 2011

### Ray Vickson

I don't know if the method has a name; it is just one of the standard types of problem examined in an optimization course, for example. I constructed a fake example with points X1 =[0.5, 1.2, 3.1, 3.8, 4.5] for the first list and X2 = [6, 7.2, 8.1, 8.9, 9.3] for the second list. So, we need 4.5 <= r
<= 6 in the previous
notation. If you know the
value of r you can set dS/da
= 0, etc, and solve the linear
system. If r is also unknown
you also need to try using
the condition dS/dr = 0. This
will give a slightly nonlinear
system to solve, which might
be nasty in some cases.
Possibly the value of r
obtained in this way will
violate the required
constraint, in which case the
optimal value of r will lie at one of the two endpoints
(either r = 4.5 or r = 6 in my case), so this would just need the solution of two fixed-r problems. However, if you use an optimization package, all that is unnecessary: just ask to minimize S(a,b,c) [or S(a,b,c,r)] directly. For example, in EXCEL you put a, b, etc., in some cells and the final formula for S in some target cell, then ask Solver to minimize the target cell by varying the "variable cell" entries. If you have constraints such as 4.5 <= r <= 6, you just add them as r >= 4.5 and r <= 6 separately. (Solver works most efficiently if constraints are written with all variables on the left and only constants on the right.) For highly nonlinear problems it is advisable to help Solver, by giving a reasonable starting point for at least se of the variables. For example, you could supply a starting value of r, such as r = 5, and let Solver correct that value. (For the case where r is not variable, you just have a purely quadratic unconstrained optimization, which Solver handles with never any problem.

RGV

Last edited: Aug 9, 2011
9. Aug 10, 2011

### Ray Vickson

Sorry for the weird formatting. For the past few days my computer was unavailable, so I had to do all my postings from an i-Phone, and that produced what you see above.

RGV

10. Aug 15, 2011

### hotvette

Seems to me the following approach is the least amount of work:

1. Solve independent least squares problems as stated in post #5

2. If the intersection of the two lines is within the required interval, problem finished

3. If the intersection is outside the required interval, use the method from post #4 for fixed r at the boundary of the interval closest to the intersection point from previous step.

11. Aug 15, 2011

### Ray Vickson

This method is actually the same as the constrained version with variable r: if we neglect the constraints on r and set dS/dr = 0 (along with the others) we get a system of equations that essentially has the same solutions for a, b, c and the intercept, as what we would get from two separate least-squares fits. (I did not before post this fact.) Then, if the intersection point is feasible, we are done; otherwise, we solve the known-r versions, which involve just linear equations to solve. I am not sure the solution is to always take the boundary point closest to the infeasible unconstrained value, although that does seem intuitively reasonable. In any case, solving two problems (one for each boundary point) does not seem onerous. [It *would* be true that taking the closest boundary point is optimal for the case in which the level surfaces of S(a,b,c,r) are convex, but with S having (possibly) non-convex terms, this is no longer automatic. Maybe it is still OK if there are only "slight" non-convexities, but that would need more investigation, and it hardly seems worth doing.]

RGV