Python Python 2.7: Fit a model to data

EnSlavingBlair · May 23, 2015

Hi,

I'm trying to get how well a known function fits to a set of data. I'm not interested in the data's line of best fit or anything, I just want to know how close it is to my model. I've tried using curve_fit and linregress but neither really give me what I'm after. My data follows a logarithmic curve, which I've been plotting up on loglog scales to get a gradient of about -4, which is close to my model (-3.9), but I'd like to know exactly how close. Linregress so far is the closest match for what I'm after, as it gives the correlation coefficient, how well the data follows the line of best fit, but it's still not exactly what I want.

def line(x,a,b):

return a*x+b

x = np.log(range(len(coll_ave)))
x = x[1:] # I've done this to avoid the whole ln(0)=infinity thing
y = np.log(coll_ave[1:])
popt, pcov = curve_fit(line, x, y, sigma=error[1:])
grad, inter, r_value, p_value, std_err = stats.linregress(x, y)

These give me great info, just not quite what I'm looking for. As far as I'm aware, polyfit doesn't work for linear models, and I'd rather work with the loglog of my data than the raw data, as I know what gradient I'm after, but I have the equation for the model as well, so really it doesn't matter. If there's a numpy or scipy version, that would be great. Or a modification to curve_fit or linregress that would make it work.

Thanks for the help :D

mfb · May 23, 2015

You'll need some quantitative definition of "being close" first. What is the output you want to get?

gsal · May 23, 2015

I kind of lost you...but it sounds like you want to skip the fitting and evaluate your known curve against the data as if it had been fitted from them...can you just evaluate the square of the residuals, then, just as you would have had to do if you were fitting with a least-square method? But, as mfb said, you are going to need more of a reference to know how good a fit a given residual indicates.

EnSlavingBlair · May 24, 2015

I guess what I'm asking is if there is a python function that will determine the correlation coefficient of 2 sets of data. Your questions have certainly helped me figure things out. I want something like numpy.corrcoef(), but with the ability to include errors for one of the 1D arrays that are in it.

From http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html it takes;
"A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables."
and a few other variables.

However, is it possible to include errors for one of the 1D arrays? One array would represent my 'model', which does not need errors, and the other array would be my data, which comes with errors that I want to take into account. I'm not sure what the maths for something like that would be though, I have not done a great deal of statistics.

mfb · May 24, 2015

I think the suggestion from gsal should work. Take the square of the residuals, then look up the probability with the chi²-distribution to get a single number.

Python Python 2.7: Fit a model to data

Thread 'Who is responsible for the software when AI takes over programming?'

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

Who is responsible for the software when AI takes over programming?

Learning Assembly and computer architecture for x86

Learning data structures and algorithms in different programming languages

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers