Python 2.7: Fit a model to data

EnSlavingBlair · May 23, 2015

Hi,

I'm trying to get how well a known function fits to a set of data. I'm not interested in the data's line of best fit or anything, I just want to know how close it is to my model. I've tried using curve_fit and linregress but neither really give me what I'm after. My data follows a logarithmic curve, which I've been plotting up on loglog scales to get a gradient of about -4, which is close to my model (-3.9), but I'd like to know exactly how close. Linregress so far is the closest match for what I'm after, as it gives the correlation coefficient, how well the data follows the line of best fit, but it's still not exactly what I want.

def line(x,a,b):

return a*x+b

x = np.log(range(len(coll_ave)))
x = x[1:] # I've done this to avoid the whole ln(0)=infinity thing
y = np.log(coll_ave[1:])
popt, pcov = curve_fit(line, x, y, sigma=error[1:])
grad, inter, r_value, p_value, std_err = stats.linregress(x, y)

These give me great info, just not quite what I'm looking for. As far as I'm aware, polyfit doesn't work for linear models, and I'd rather work with the loglog of my data than the raw data, as I know what gradient I'm after, but I have the equation for the model as well, so really it doesn't matter. If there's a numpy or scipy version, that would be great. Or a modification to curve_fit or linregress that would make it work.

Thanks for the help :D

mfb · May 23, 2015

You'll need some quantitative definition of "being close" first. What is the output you want to get?

gsal · May 23, 2015

I kind of lost you...but it sounds like you want to skip the fitting and evaluate your known curve against the data as if it had been fitted from them...can you just evaluate the square of the residuals, then, just as you would have had to do if you were fitting with a least-square method? But, as mfb said, you are going to need more of a reference to know how good a fit a given residual indicates.

EnSlavingBlair · May 24, 2015

I guess what I'm asking is if there is a python function that will determine the correlation coefficient of 2 sets of data. Your questions have certainly helped me figure things out. I want something like numpy.corrcoef(), but with the ability to include errors for one of the 1D arrays that are in it.

From http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html it takes;
"A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables."
and a few other variables.

However, is it possible to include errors for one of the 1D arrays? One array would represent my 'model', which does not need errors, and the other array would be my data, which comes with errors that I want to take into account. I'm not sure what the maths for something like that would be though, I have not done a great deal of statistics.

mfb · May 24, 2015

I think the suggestion from gsal should work. Take the square of the residuals, then look up the probability with the chi²-distribution to get a single number.

Python 2.7: Fit a model to data

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect