1. Not finding help here? Sign up for a free 30min tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question about Least Squares Fitting

  1. Aug 7, 2011 #1
    Hey,

    I have a graph for which im supposed to fit two linear least squares line and minimize the combined residuals (the lines intersect)... I would really appreciate some info about how to do this or what this type of data analysis is called so i can google the step-by-step method.

    Thanks!
     
    Last edited: Aug 7, 2011
  2. jcsd
  3. Aug 8, 2011 #2

    hotvette

    User Avatar
    Homework Helper

    Do you know which points are suppose to go with each line, or is that part of the task?
     
  4. Aug 8, 2011 #3
    Yeah, i know which points go with each line
     
  5. Aug 8, 2011 #4

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    Your question is not 100% clear, but I will *assume* you have y = a + b*x for x< r and y = g + h*x for x > r. If r is known, and if y should be continuous at x = r, then you can write y = a + b*r + c*(x-r) for x > r, so if points 1,...,m are for x < r and m+1,...,n are for x > r, you need to minimize S = sum{(y_j - a - b*x_j)^2: j=1,...,m} + sum{(y_j - a - b*r - c*(x_j - r))^2: j=m+1,...,n}. If r is known, this is a function of a, b and c. If estimation of r itself is part of the problem, then r is also a variable in the optimization, but in that case you have a nonlinear least-squares problem because of the presence of terms b*r and c*r. However, many efficient and effective solution methods exist, including, for example, using the EXCEL Solver Tool.

    Good luck.

    RGV
     
  6. Aug 8, 2011 #5

    hotvette

    User Avatar
    Homework Helper

    In that case, it seems to me that you have two totally separate least squares problems that can be analyzed independently from one another. Formula for least squares line to a set of data points is at the bottom of the following link:

    http://www.ies.co.jp/math/java/misc/least_sq/least_sq.html [Broken]
     
    Last edited by a moderator: May 5, 2017
  7. Aug 8, 2011 #6
    Hotvette - No that wouldn't work but thanks anyways

    RGV - That method is exactly what I need. Thank you. Just wondering what's it called. Also, how would I go about minimizing S (assuming I have r) ... would I take the derivatives wrt a,b,c and then solve for a,b,c then plug them back into the sum equation. Would the excel solver tool allow me to do this?


    I would really appreciate any input
     
    Last edited: Aug 9, 2011
  8. Aug 9, 2011 #7

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    If the problem is as I described in my previous response, then your suggestion can lead to an incorrect solution; I have an example where that happens (because the intersection of the two lines lies in the wrong place; that is, the point where we need to switch from formula1 to
    formula2 lies inside one of
    the regions, so the wrong
    formula is applied at some
    data points. Actually, in my
    previous response I
    neglected the necessary
    constraint x_n <= r <=
    x_{n+1} in the variable-r
    case. If we omit this
    constraint the solution is the
    same as yours, but
    sometimes this is wrong.

    RGV
     
    Last edited: Aug 9, 2011
  9. Aug 9, 2011 #8

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    I don't know if the method has a name; it is just one of the standard types of problem examined in an optimization course, for example. I constructed a fake example with points X1 =[0.5, 1.2, 3.1, 3.8, 4.5] for the first list and X2 = [6, 7.2, 8.1, 8.9, 9.3] for the second list. So, we need 4.5 <= r
    <= 6 in the previous
    notation. If you know the
    value of r you can set dS/da
    = 0, etc, and solve the linear
    system. If r is also unknown
    you also need to try using
    the condition dS/dr = 0. This
    will give a slightly nonlinear
    system to solve, which might
    be nasty in some cases.
    Possibly the value of r
    obtained in this way will
    violate the required
    constraint, in which case the
    optimal value of r will lie at one of the two endpoints
    (either r = 4.5 or r = 6 in my case), so this would just need the solution of two fixed-r problems. However, if you use an optimization package, all that is unnecessary: just ask to minimize S(a,b,c) [or S(a,b,c,r)] directly. For example, in EXCEL you put a, b, etc., in some cells and the final formula for S in some target cell, then ask Solver to minimize the target cell by varying the "variable cell" entries. If you have constraints such as 4.5 <= r <= 6, you just add them as r >= 4.5 and r <= 6 separately. (Solver works most efficiently if constraints are written with all variables on the left and only constants on the right.) For highly nonlinear problems it is advisable to help Solver, by giving a reasonable starting point for at least se of the variables. For example, you could supply a starting value of r, such as r = 5, and let Solver correct that value. (For the case where r is not variable, you just have a purely quadratic unconstrained optimization, which Solver handles with never any problem.

    RGV
     
    Last edited: Aug 9, 2011
  10. Aug 10, 2011 #9

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    Sorry for the weird formatting. For the past few days my computer was unavailable, so I had to do all my postings from an i-Phone, and that produced what you see above.

    RGV
     
  11. Aug 15, 2011 #10

    hotvette

    User Avatar
    Homework Helper

    Seems to me the following approach is the least amount of work:

    1. Solve independent least squares problems as stated in post #5

    2. If the intersection of the two lines is within the required interval, problem finished

    3. If the intersection is outside the required interval, use the method from post #4 for fixed r at the boundary of the interval closest to the intersection point from previous step.
     
  12. Aug 15, 2011 #11

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    This method is actually the same as the constrained version with variable r: if we neglect the constraints on r and set dS/dr = 0 (along with the others) we get a system of equations that essentially has the same solutions for a, b, c and the intercept, as what we would get from two separate least-squares fits. (I did not before post this fact.) Then, if the intersection point is feasible, we are done; otherwise, we solve the known-r versions, which involve just linear equations to solve. I am not sure the solution is to always take the boundary point closest to the infeasible unconstrained value, although that does seem intuitively reasonable. In any case, solving two problems (one for each boundary point) does not seem onerous. [It *would* be true that taking the closest boundary point is optimal for the case in which the level surfaces of S(a,b,c,r) are convex, but with S having (possibly) non-convex terms, this is no longer automatic. Maybe it is still OK if there are only "slight" non-convexities, but that would need more investigation, and it hardly seems worth doing.]

    RGV
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Question about Least Squares Fitting
  1. Least Squares Fit (Replies: 3)

  2. Least Squares Fitting (Replies: 1)

  3. Least squares fitting (Replies: 12)

  4. Least-squares fit (Replies: 2)

Loading...