Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

How do you create best fit line?

  1. Apr 28, 2008 #1
    Like the title says, if you have a bunch of data you can create a best fit line
    For example back in school, for a linear line, y = mx + c, you just need 2 points to get a line. But what if you have more than 2 points, say 10, what is the best fit line equation or how do you do it mathematically.

    I'm sure excel can do it through the trendline method but back to basics, how do people do it. I'm looking more for polynomial, 2nd order. And data is not like 1,2,4,8,16 which you can deduce to y = x^2. But more like 'double' or 'float', so trying to get a best fit line and future prediction is a lot more harder

    Just wondering if anyone ever look at something like this or how people find coefficients based on data information they have accumulated (which is how in real life happens and then deduce your own equation to reflect the change in information)

    Is there a book somewhere or resources i could look at would be helpful too

    Thanks
     
  2. jcsd
  3. Apr 28, 2008 #2
    There are lots and lots of books on this topic.

    First you will have to decide what kind of curve you want to fit to your data, say a straight line or a parabola or a polynomial of degree 27 or some exponential function(though this might be harder). Then each function from the "pool" you decided to choose from (for example the straight lines) is determinded by a certain number of parameters (for straight lines there are two of them) and the goal is to find the "best" values for these parameters.

    For this you have to think about which parameters are "good" and which are "bad", that is you have to define some measure of how "well" a given curve (corresponding to a certain function in your "pool") approximates your data. One way to do that is to interpret your data pairs (it should be pairs) as measurements (x,m(x)). Certainly if a function f is to approximate these data well f(x) should be about equal to m(x), so one very common measure one uses the the sum of the squares (f(x)-m(x))^2 (summed over all your data points). you then want to find the function which minimizes this sum, which is why the method is called "least square fit".
    In the case you're fitting a straight line there is a rather easy general formula giving you the best values for the two parameters, if you consider other families of approximating functions such formulas might be long or might not exist at all.
    Note that in this method you do not treat the two components of your data pairs the same way, rather it inherently assumes that one coordinate is the measurement and thus has an error while the other coordinate does not have an error. This is often a reasonable asumptions, sometimes it is not, in which case you will have to modify the procedure.

    So mathematically it is all about minimization and best approximation in function spaces endowed with some topology, which is why the methods used in the theoretical analysis of your problem are typically those of functional analysis.
     
    Last edited: Apr 28, 2008
  4. Apr 29, 2008 #3

    HallsofIvy

    User Avatar
    Staff Emeritus
    Science Advisor

  5. Apr 29, 2008 #4
    thanks for the explanation and the link. will look into it
     
  6. Apr 29, 2008 #5
    Least squares can be generalized for polynomials. But a common approach to fitting data that you might wish to use is cubic spline interpolation--

    http://www.physics.utah.edu/~detar/phycs6720/handouts/cubic_spline/cubic_spline/node1.html

    The idea is that you use your x-coordinates to create intervals, and you fit cubic polynomials to each interval such that the function and it's derivative are continuous at the end points of each interval.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?