Python 2.7: Fit a model to data

Click For Summary

Discussion Overview

The discussion revolves around fitting a known logarithmic model to a set of data using Python, specifically focusing on how to quantify the closeness of the fit without relying solely on traditional methods like line of best fit. Participants explore various statistical approaches and seek a function that can accommodate error in the data.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant expresses a desire to know how closely their data fits a known logarithmic model, mentioning that traditional methods like curve_fit and linregress do not meet their needs.
  • Another participant suggests that a quantitative definition of "being close" is necessary to proceed with the analysis.
  • A different participant proposes evaluating the square of the residuals as a way to assess the fit against the known curve, indicating that this could provide a measure similar to least-squares fitting.
  • One participant seeks a Python function that can calculate the correlation coefficient between two datasets while incorporating errors in one of the datasets, referencing numpy.corrcoef as a starting point.
  • Another participant recommends using the square of the residuals and then consulting the chi-squared distribution to derive a single numerical value for the fit quality.

Areas of Agreement / Disagreement

Participants have not reached a consensus on the best method to quantify the fit of the model to the data. Multiple approaches are suggested, and there is ongoing exploration of the problem without a definitive resolution.

Contextual Notes

The discussion highlights the need for a clear definition of fit quality and the challenges of incorporating error into statistical measures, which remain unresolved.

EnSlavingBlair
Messages
34
Reaction score
2
Hi,

I'm trying to get how well a known function fits to a set of data. I'm not interested in the data's line of best fit or anything, I just want to know how close it is to my model. I've tried using curve_fit and linregress but neither really give me what I'm after. My data follows a logarithmic curve, which I've been plotting up on loglog scales to get a gradient of about -4, which is close to my model (-3.9), but I'd like to know exactly how close. Linregress so far is the closest match for what I'm after, as it gives the correlation coefficient, how well the data follows the line of best fit, but it's still not exactly what I want.

def line(x,a,b):
return a*x+b​

x = np.log(range(len(coll_ave)))
x = x[1:] # I've done this to avoid the whole ln(0)=infinity thing
y = np.log(coll_ave[1:])
popt, pcov = curve_fit(line, x, y, sigma=error[1:])
grad, inter, r_value, p_value, std_err = stats.linregress(x, y)​

These give me great info, just not quite what I'm looking for. As far as I'm aware, polyfit doesn't work for linear models, and I'd rather work with the loglog of my data than the raw data, as I know what gradient I'm after, but I have the equation for the model as well, so really it doesn't matter. If there's a numpy or scipy version, that would be great. Or a modification to curve_fit or linregress that would make it work.

Thanks for the help :D
 
Technology news on Phys.org
You'll need some quantitative definition of "being close" first. What is the output you want to get?
 
I kind of lost you...but it sounds like you want to skip the fitting and evaluate your known curve against the data as if it had been fitted from them...can you just evaluate the square of the residuals, then, just as you would have had to do if you were fitting with a least-square method? But, as mfb said, you are going to need more of a reference to know how good a fit a given residual indicates.
 
I guess what I'm asking is if there is a python function that will determine the correlation coefficient of 2 sets of data. Your questions have certainly helped me figure things out. I want something like numpy.corrcoef(), but with the ability to include errors for one of the 1D arrays that are in it.

From http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html it takes;
"A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables."
and a few other variables.

However, is it possible to include errors for one of the 1D arrays? One array would represent my 'model', which does not need errors, and the other array would be my data, which comes with errors that I want to take into account. I'm not sure what the maths for something like that would be though, I have not done a great deal of statistics.
 
I think the suggestion from gsal should work. Take the square of the residuals, then look up the probability with the chi2-distribution to get a single number.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 6 ·
Replies
6
Views
7K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
5
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K