Finding a function to 'fit' data? Regression?

Click For Summary

Discussion Overview

The discussion revolves around finding a suitable polynomial function to fit a set of planar data points that do not necessarily follow a linear trend. Participants explore various methods of regression and function fitting, considering both statistical and non-statistical approaches.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest that the choice of polynomial form may depend on the context of the data, such as logistic or exponential growth in population simulations.
  • Others argue that if the data points are known without any additional context, one could seek a function that either exactly covers or approximately fits the points without predefined polynomial forms.
  • One participant mentions using Excel to visualize data and apply various trendlines, noting that the maximum number of terms in a regression equation is limited by the number of data points.
  • Another participant states that a polynomial of degree N can fit N points exactly, but as the degree decreases, the accuracy may also decrease, suggesting a trial-and-error approach to find an acceptable fit.
  • It is noted that an (N-1) polynomial will fit all points but may not be useful for extrapolation due to potential oscillations outside the data range.
  • Some participants propose using spline curves for a stable fit that interpolates well between points, while cautioning about biases in numerical regression methods.
  • One participant introduces Locally Weighted Regression as a method that can effectively fit data points, although it may not yield an explicit polynomial equation.

Areas of Agreement / Disagreement

Participants express differing views on the best approach to fitting data points with polynomials, with no consensus reached on a single method or degree of polynomial to use. The discussion remains unresolved regarding the optimal strategy for function fitting.

Contextual Notes

Participants highlight limitations such as the dependence on the context of the data, the potential for overfitting with high-degree polynomials, and the challenges of extrapolation. There are also concerns about biases in regression software that may affect results.

cAm
Messages
49
Reaction score
0
Say i have a set of points that aren't necessarily linear, but are planar, and all follow a 'general' trend in the same direction. Say, something like this:

http://img343.imageshack.us/img343/2218/pointdistribution1jo.jpg


This is an entirely random example, but hopefully it'll help you get the picture. What i need to know, is how to create a polynomial function that curves, generally following these points. I'm researching regression, but from what i can see, i have to know what form the polynomial will be in, then solve for the coefficients. I need to know how it could be done, not knowing the form of the polynomial.
 
Last edited by a moderator:
Physics news on Phys.org
the point is in most case depends on the situation itself, for example, in population simulation, you know it is going to be logistic/exp grown depends on the limiting factor. If points arent come from set of data, it is meaningless to make a best fit function (and lost the point of statistic). So most case you are the one who determind what function fits the best based of the given factor.
lets take your graph for example, it can be linear, binomial, trinomial and etc.
thus, in that graph, you don't know what is going on out side of this domain, the best fit function is just respect to this domain (maybe a little larger).
 
well, suppose i don't need it for statistical purposes, so it's meaninglessness in that fashion doesn't matter. Say i just have that set of points, i know nothing else about them. What would be the best way to find a function that covered those points, either directly covering them, or fitting to them approximately? And, i would want the function to be of some order that best fits the data, not defined by me.
 
To think of it, I would start with dumping all data into Excel and plotting it, then use Excel's plot options to add a "Trendline" to the data series (I think that's what Excel calls a least-squares line fit, same as an ordinary least squares regression). You can play around with different functional forms that are part of the Excel plot package (polynomial up to the 6th, but also log, exponential, and a few others). See which one seems to fit the best. It should be intuitive because it is visual. Then you can pursue more hi-tech options. One constraint is, you can only plot one variable at a time (e.g. x1 vs. y, x2 vs. y, etc.). If you have multiple vars you can still plot them one by one and try to get an overall sense, but it will not be exact (because you'd be ignoring co-variations between x1 and x2, etc.).

More generally, the max # of terms on the right-hand-side of a regression equation is your data points minus one (or less, under other conditions). There are stats. packages (e.g., SAS) with "canned" procedures that start with all the powers up to an arbitrary power, and then sequentially eliminate terms based on the "significance level" of the corresponding coefficient. The alternative is to try this manually (if you decide to use Excel for this then manual is the only option); run the regression with all powers up to "P" and then eliminate one by one, starting with the lowest "t-stat" value. At most this will take P - 1 regression runs (at which point you'd be down to a single term alone). Hopefully you will find a set of powers that are individually and jointly significant.
cAm said:
Say i have a set of points that aren't necessarily linear, but are planar, and all follow a 'general' trend in the same direction. Say, something like this:

http://img343.imageshack.us/img343/2218/pointdistribution1jo.jpg


This is an entirely random example, but hopefully it'll help you get the picture. What i need to know, is how to create a polynomial function that curves, generally following these points. I'm researching regression, but from what i can see, i have to know what form the polynomial will be in, then solve for the coefficients. I need to know how it could be done, not knowing the form of the polynomial.
 
Last edited by a moderator:
Edit: Ugh, this was a spammers resurrection of an ancient thread :/ sorry about replying, I didn't notice.

A polynomial of the N-th degree can be made to match any dataset of N points. So if you need dead on accuracy, that might be the way to go. As the degrees of your polynomial drops below the number of datapoints, the probability of a loss in accuracy increases.

The only way to find the simplest one that suits you is to start at N and then work your way down until you reach the amount of error you can live with.

I don't think it is possible for a (N-1) polynomial to be a better match than one of N degrees, assuming they are both optimal for their amount of roots and that there are at least N datapoints.

Actually, when I think about it it's trivial to prove, so you can safely stop once you reach your comfort zone with errors.

k
 
Last edited:
An (n-1) polynomial will hit all the points, but is unlikely to be useful for interpolating or extrapolating experimental data, because often such curves will contain maxima and minima far outside the range of recorded values.

If you eyeball the type of curve you expect to see, and count the number of inflection points, then trying a regression to a polynomial of degree one greater will often provide a good first approximation, but the curve will not usually hit your points.

If you need a polynomial-type function that is both stable and hits your points, then you may benefit from using a spline curve, which is piece-wise defined as polynomials (e.g. cubic spline), but is continuous and has continuous first and second derivatives as well. This type of function fitting is particularly well-suited for interpolation, but is not usually very useful for extrapolating beyond the end of the observed data.

When using software packages, be aware that some numerical regression suites are biased with respect to the orientation of the axis. For instance, suppose your cloud of data points suggests a line X=0. The regression to a line may instead give a result Y=0, even if this line has a worse fit! There are numerical methods that avoid this bias, but you need to be sure that the suite you are using has this feature.
 
Hey all,

Concerning that issue, I don't really know if from statistical information one can know which degree of polynomial to fit or what type of function to use. Since, I have been working for some time with fitted value iteration in machine learning, I can answer as follows:
having any set of points Locally Weighted Regression does the best job possible, it is really awesome.
The only problem is the performance and if you care for an overall polynomial that is found as an explicit equation, because in the latter regression method you will be fitting like infinite str-lines through your points taking only some vicinity of the data and weighting them.:)
I think it is really nice :) Hope that this can help :)
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
3
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K