Finding a function to 'fit' data? Regression?

In summary: They can provide a curve fit that doesn't hit your points, but the error tolerances for the coefficients are so large that the curve could take on almost any shape within those error bounds. I don't know which packages do and don't do this, but I've heard the claims about some of them.In summary, when trying to fit a polynomial function to a set of points, it is important to consider the type of curve you expect to see and the number of points available. A polynomial of N-th degree can perfectly fit N points, but may not be useful for interpolating or extrapolating data. A spline curve can provide a stable and continuous fit that is well-suited for interpolation, but may not
  • #1
cAm
49
0
Say i have a set of points that aren't necessarily linear, but are planar, and all follow a 'general' trend in the same direction. Say, something like this:

http://img343.imageshack.us/img343/2218/pointdistribution1jo.jpg


This is an entirely random example, but hopefully it'll help you get the picture. What i need to know, is how to create a polynomial function that curves, generally following these points. I'm researching regression, but from what i can see, i have to know what form the polynomial will be in, then solve for the coefficients. I need to know how it could be done, not knowing the form of the polynomial.
 
Last edited by a moderator:
Mathematics news on Phys.org
  • #2
the point is in most case depends on the situation itself, for example, in population simulation, you know it is going to be logistic/exp grown depends on the limiting factor. If points arent come from set of data, it is meaningless to make a best fit function (and lost the point of statistic). So most case you are the one who determind what function fits the best based of the given factor.
lets take your graph for example, it can be linear, binomial, trinomial and etc.
thus, in that graph, you don't know what is going on out side of this domain, the best fit function is just respect to this domain (maybe a little larger).
 
  • #3
well, suppose i don't need it for statistical purposes, so it's meaninglessness in that fashion doesn't matter. Say i just have that set of points, i know nothing else about them. What would be the best way to find a function that covered those points, either directly covering them, or fitting to them approximately? And, i would want the function to be of some order that best fits the data, not defined by me.
 
  • #4
To think of it, I would start with dumping all data into Excel and plotting it, then use Excel's plot options to add a "Trendline" to the data series (I think that's what Excel calls a least-squares line fit, same as an ordinary least squares regression). You can play around with different functional forms that are part of the Excel plot package (polynomial up to the 6th, but also log, exponential, and a few others). See which one seems to fit the best. It should be intuitive because it is visual. Then you can pursue more hi-tech options. One constraint is, you can only plot one variable at a time (e.g. x1 vs. y, x2 vs. y, etc.). If you have multiple vars you can still plot them one by one and try to get an overall sense, but it will not be exact (because you'd be ignoring co-variations between x1 and x2, etc.).

More generally, the max # of terms on the right-hand-side of a regression equation is your data points minus one (or less, under other conditions). There are stats. packages (e.g., SAS) with "canned" procedures that start with all the powers up to an arbitrary power, and then sequentially eliminate terms based on the "significance level" of the corresponding coefficient. The alternative is to try this manually (if you decide to use Excel for this then manual is the only option); run the regression with all powers up to "P" and then eliminate one by one, starting with the lowest "t-stat" value. At most this will take P - 1 regression runs (at which point you'd be down to a single term alone). Hopefully you will find a set of powers that are individually and jointly significant.
cAm said:
Say i have a set of points that aren't necessarily linear, but are planar, and all follow a 'general' trend in the same direction. Say, something like this:

http://img343.imageshack.us/img343/2218/pointdistribution1jo.jpg


This is an entirely random example, but hopefully it'll help you get the picture. What i need to know, is how to create a polynomial function that curves, generally following these points. I'm researching regression, but from what i can see, i have to know what form the polynomial will be in, then solve for the coefficients. I need to know how it could be done, not knowing the form of the polynomial.
 
Last edited by a moderator:
  • #5
Edit: Ugh, this was a spammers resurrection of an ancient thread :/ sorry about replying, I didn't notice.

A polynomial of the N-th degree can be made to match any dataset of N points. So if you need dead on accuracy, that might be the way to go. As the degrees of your polynomial drops below the number of datapoints, the probability of a loss in accuracy increases.

The only way to find the simplest one that suits you is to start at N and then work your way down until you reach the amount of error you can live with.

I don't think it is possible for a (N-1) polynomial to be a better match than one of N degrees, assuming they are both optimal for their amount of roots and that there are at least N datapoints.

Actually, when I think about it it's trivial to prove, so you can safely stop once you reach your comfort zone with errors.

k
 
Last edited:
  • #6
An (n-1) polynomial will hit all the points, but is unlikely to be useful for interpolating or extrapolating experimental data, because often such curves will contain maxima and minima far outside the range of recorded values.

If you eyeball the type of curve you expect to see, and count the number of inflection points, then trying a regression to a polynomial of degree one greater will often provide a good first approximation, but the curve will not usually hit your points.

If you need a polynomial-type function that is both stable and hits your points, then you may benefit from using a spline curve, which is piece-wise defined as polynomials (e.g. cubic spline), but is continuous and has continuous first and second derivatives as well. This type of function fitting is particularly well-suited for interpolation, but is not usually very useful for extrapolating beyond the end of the observed data.

When using software packages, be aware that some numerical regression suites are biased with respect to the orientation of the axis. For instance, suppose your cloud of data points suggests a line X=0. The regression to a line may instead give a result Y=0, even if this line has a worse fit! There are numerical methods that avoid this bias, but you need to be sure that the suite you are using has this feature.
 
  • #7
Hey all,

Concerning that issue, I don't really know if from statistical information one can know which degree of polynomial to fit or what type of function to use. Since, I have been working for some time with fitted value iteration in machine learning, I can answer as follows:
having any set of points Locally Weighted Regression does the best job possible, it is really awesome.
The only problem is the performance and if you care for an overall polynomial that is found as an explicit equation, because in the latter regression method you will be fitting like infinite str-lines through your points taking only some vicinity of the data and weighting them.:)
I think it is really nice :) Hope that this can help :)
 

What is the purpose of finding a function to 'fit' data?

The purpose of finding a function to 'fit' data is to identify a mathematical relationship between a set of input data and corresponding output data. This function can then be used to make predictions or analyze the data in a more meaningful way.

What is regression analysis?

Regression analysis is a statistical method used to identify the relationship between a dependent variable and one or more independent variables. This method helps to understand how the dependent variable changes when the independent variables are manipulated.

What types of regression analysis are commonly used?

The most commonly used types of regression analysis include linear regression, polynomial regression, logistic regression, and multiple regression. Each type is suitable for different types of data and can be used to find a function that best fits the data.

What factors should be considered when choosing a regression model?

There are several factors that should be considered when choosing a regression model, such as the type of data, the relationship between the variables, the number of variables, and the level of complexity needed to accurately represent the data. It is important to choose a model that best fits the data and is appropriate for the research question.

How can the accuracy of a regression model be evaluated?

The accuracy of a regression model can be evaluated by looking at the coefficient of determination (R-squared), which indicates how well the model fits the data. Additionally, the residuals (the difference between the predicted and actual values) can be analyzed to determine if the model adequately captures the variation in the data. Other methods such as cross-validation can also be used to assess the accuracy of a regression model.

Similar threads

  • General Math
Replies
1
Views
810
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
829
  • General Math
Replies
19
Views
1K
Replies
4
Views
1K
  • General Math
Replies
3
Views
963
  • General Math
Replies
5
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
Back
Top