Question about Least Squares Fitting

In summary, the Solver Tool can be used to minimize a function of a, b, and c, depending on whether or not r is known. If r is known, the function is linear, and if r is unknown, the function is nonlinear.
  • #1
bhr11
4
0
Hey,

I have a graph for which I am supposed to fit two linear least squares line and minimize the combined residuals (the lines intersect)... I would really appreciate some info about how to do this or what this type of data analysis is called so i can google the step-by-step method.

Thanks!
 
Last edited:
Physics news on Phys.org
  • #2
Do you know which points are suppose to go with each line, or is that part of the task?
 
  • #3
Yeah, i know which points go with each line
 
  • #4
Your question is not 100% clear, but I will *assume* you have y = a + b*x for x< r and y = g + h*x for x > r. If r is known, and if y should be continuous at x = r, then you can write y = a + b*r + c*(x-r) for x > r, so if points 1,...,m are for x < r and m+1,...,n are for x > r, you need to minimize S = sum{(y_j - a - b*x_j)^2: j=1,...,m} + sum{(y_j - a - b*r - c*(x_j - r))^2: j=m+1,...,n}. If r is known, this is a function of a, b and c. If estimation of r itself is part of the problem, then r is also a variable in the optimization, but in that case you have a nonlinear least-squares problem because of the presence of terms b*r and c*r. However, many efficient and effective solution methods exist, including, for example, using the EXCEL Solver Tool.

Good luck.

RGV
 
  • #5
bhr11 said:
Yeah, i know which points go with each line

In that case, it seems to me that you have two totally separate least squares problems that can be analyzed independently from one another. Formula for least squares line to a set of data points is at the bottom of the following link:

http://www.ies.co.jp/math/java/misc/least_sq/least_sq.html [Broken]
 
Last edited by a moderator:
  • #6
Hotvette - No that wouldn't work but thanks anyways

RGV - That method is exactly what I need. Thank you. Just wondering what's it called. Also, how would I go about minimizing S (assuming I have r) ... would I take the derivatives wrt a,b,c and then solve for a,b,c then plug them back into the sum equation. Would the excel solver tool allow me to do this?I would really appreciate any input
 
Last edited:
  • #7
If the problem is as I described in my previous response, then your suggestion can lead to an incorrect solution; I have an example where that happens (because the intersection of the two lines lies in the wrong place; that is, the point where we need to switch from formula1 to
formula2 lies inside one of
the regions, so the wrong
formula is applied at some
data points. Actually, in my
previous response I
neglected the necessary
constraint x_n <= r <=
x_{n+1} in the variable-r
case. If we omit this
constraint the solution is the
same as yours, but
sometimes this is wrong.

RGV
 
Last edited:
  • #8
I don't know if the method has a name; it is just one of the standard types of problem examined in an optimization course, for example. I constructed a fake example with points X1 =[0.5, 1.2, 3.1, 3.8, 4.5] for the first list and X2 = [6, 7.2, 8.1, 8.9, 9.3] for the second list. So, we need 4.5 <= r
<= 6 in the previous
notation. If you know the
value of r you can set dS/da
= 0, etc, and solve the linear
system. If r is also unknown
you also need to try using
the condition dS/dr = 0. This
will give a slightly nonlinear
system to solve, which might
be nasty in some cases.
Possibly the value of r
obtained in this way will
violate the required
constraint, in which case the
optimal value of r will lie at one of the two endpoints
(either r = 4.5 or r = 6 in my case), so this would just need the solution of two fixed-r problems. However, if you use an optimization package, all that is unnecessary: just ask to minimize S(a,b,c) [or S(a,b,c,r)] directly. For example, in EXCEL you put a, b, etc., in some cells and the final formula for S in some target cell, then ask Solver to minimize the target cell by varying the "variable cell" entries. If you have constraints such as 4.5 <= r <= 6, you just add them as r >= 4.5 and r <= 6 separately. (Solver works most efficiently if constraints are written with all variables on the left and only constants on the right.) For highly nonlinear problems it is advisable to help Solver, by giving a reasonable starting point for at least se of the variables. For example, you could supply a starting value of r, such as r = 5, and let Solver correct that value. (For the case where r is not variable, you just have a purely quadratic unconstrained optimization, which Solver handles with never any problem.

RGV
 
Last edited:
  • #9
Sorry for the weird formatting. For the past few days my computer was unavailable, so I had to do all my postings from an i-Phone, and that produced what you see above.

RGV
 
  • #10
Seems to me the following approach is the least amount of work:

1. Solve independent least squares problems as stated in post #5

2. If the intersection of the two lines is within the required interval, problem finished

3. If the intersection is outside the required interval, use the method from post #4 for fixed r at the boundary of the interval closest to the intersection point from previous step.
 
  • #11
hotvette said:
Seems to me the following approach is the least amount of work:

1. Solve independent least squares problems as stated in post #5

2. If the intersection of the two lines is within the required interval, problem finished

3. If the intersection is outside the required interval, use the method from post #4 for fixed r at the boundary of the interval closest to the intersection point from previous step.

This method is actually the same as the constrained version with variable r: if we neglect the constraints on r and set dS/dr = 0 (along with the others) we get a system of equations that essentially has the same solutions for a, b, c and the intercept, as what we would get from two separate least-squares fits. (I did not before post this fact.) Then, if the intersection point is feasible, we are done; otherwise, we solve the known-r versions, which involve just linear equations to solve. I am not sure the solution is to always take the boundary point closest to the infeasible unconstrained value, although that does seem intuitively reasonable. In any case, solving two problems (one for each boundary point) does not seem onerous. [It *would* be true that taking the closest boundary point is optimal for the case in which the level surfaces of S(a,b,c,r) are convex, but with S having (possibly) non-convex terms, this is no longer automatic. Maybe it is still OK if there are only "slight" non-convexities, but that would need more investigation, and it hardly seems worth doing.]

RGV
 

1. What is Least Squares Fitting?

Least Squares Fitting is a statistical method used to find the best fit line or curve for a set of data points. It is commonly used in regression analysis to determine the relationship between two variables.

2. How does Least Squares Fitting work?

Least Squares Fitting works by minimizing the sum of the squared differences between the predicted values from the fitted line or curve and the actual data points. This is achieved by adjusting the parameters of the line or curve until the sum of the squared differences is minimized.

3. What are the assumptions of Least Squares Fitting?

The assumptions of Least Squares Fitting include a linear relationship between the variables, normally distributed errors, and constant variance of errors. It also assumes that the errors are independent of each other and that there are no influential outliers in the data.

4. What is the difference between Ordinary Least Squares and Weighted Least Squares?

Ordinary Least Squares assumes that all data points have equal importance in determining the best fit line or curve, while Weighted Least Squares takes into account the variability of the data points and assigns weights accordingly. This means that data points with lower variability have a higher weight in the fitting process.

5. When is Least Squares Fitting not appropriate?

Least Squares Fitting may not be appropriate when the assumptions of the method are violated. This can happen when the relationship between the variables is non-linear, the errors are not normally distributed, or there are influential outliers in the data. In these cases, alternative methods such as nonlinear regression or robust regression may be more suitable.

Similar threads

  • Calculus and Beyond Homework Help
Replies
3
Views
769
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
365
  • Programming and Computer Science
Replies
4
Views
528
  • Calculus and Beyond Homework Help
Replies
12
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Advanced Physics Homework Help
Replies
0
Views
130
Back
Top