# Regression Analysis on Theoretical Model

1. Mar 25, 2014

### samee

Hi everyone. I'm a graduate student and am struggling with something that may possibly be trivial. So, my research is creating a mathematical model to represent a real system. I have data points from my real system that I want to compare my model to. How do I do a regression analysis and get an r^2 value for the data points fit to my model?

Excel wants to fit it to a line and then do a regression analysis. I tried to figure out how to do it by hand, but am only finding linear regression analyses...

The model is something like this;

y=a/(1+b*x) Where a and b are material constants.

I went ahead and plugged in the data points that I have from the experimental publication into my model and I know a regression analysis says something about how close my data is to my model, for the life of me I cannot find anything useful online to help me calculate this.

Can anyone help me out here?

2. Mar 25, 2014

### maajdl

You simply need to assume values for your parameters,
and calculate the χ² which is the sum of the (y(xi)-yi)²/σi² on your observations.
There, xi and yi are the observations and σi the uncertainties on (y(xi)-yi).
You can then try to find out the values of the parameters that make χ² minimum.
(which you can do with the Excel solver)

The value of χ² at the minimum and how it behaves near the minimum will allow you to estimate the precision of your parameters.

Have a look at "Numerical Recipes": http://www.nr.com and http://apps.nrbook.com/c/index.html (chap 15) .

Last edited: Mar 25, 2014
3. Mar 25, 2014

### Stephen Tashi

Are you modeling any random errors? Do errors in measurement occur only for y or do they also occur for x?

4. Mar 25, 2014

### samee

My work is only the model. I'm comparing the model to a set of published data from another research group. Aside from reading the paper, I really have no idea how they did their experiment. What I mean is, they didn't publish any error analysis for their data points, so I don't know what to do about that.

I remember working with things like this back when I was an undergrad, but it's been years. Can I do a regression analysis without an error estimation on the physical data?

5. Mar 26, 2014

### FactChecker

Fit a linear regression to the values of your (1/y, x) data. That will give a model z = A*X+B, where z=1/y and A and B are the results of the linear regression. Those results can be compared to your model. As others have stated, you should be more specific about if and how you are inserting random errors into your model.

6. Mar 27, 2014

### samee

Ok, I've been working on this and am still confused. I think that I'm missing some key information at a very basic level and that's what's killing me. So basically, I'm modeling the mechanical performance of a material in terms of it's elastic and shear response in relation to it's porosity. I built a model that accounts for 3 different sets of inclusions within the material (using Hill's tensor) and determines the relation between porosity and each component of the mechanical performance tensors. Specifically I'm interested in the transverse and longitudinal elastic modulus, E1 and E3, and the transverse and longitudinal shear modulus, G13 and G12. Without getting into too many details, I came up with E1=E0/(1+P(H)E0) where E0 is the elastic response is there was no porosity, H is the sum of the hills tensor for the different pore types, P is the porosity of the material, and E1 is the longitudinal elastic modulus. I have similar expressions with different constants for E3, G12, and G13.

The model is done the paper is written, and at the last minute before we had submit it, my adviser asked me to add a statistical evaluation for how good of a fit our model is to the experimental data we were comparing it to. He does not have experience with statistical analysis and neither do I. He told me to do an r^2 regression analysis.

After looking at it more in depth, my understanding of the r^2 regression analysis is that it relates the x and y variables to see if there's a correlation. What I want is to see how well my model fits some data points. So I'm pretty sure he must have been mistaken when he asked me for a regression analysis r^2 value. u/maajdl suggested that I need chi^2 and I think he's completely right, I do. His formula involved x and y, but I only have one input, and his equation involved error, which I have no idea what to say on. I did not include error in my model and the published experimental data points I'm comparing my model to also seems to have no estimation on error.

SO- Wikipedia gave me this;

χ2=$\sum$$\overline{n}$$\underline{i=1}$$\stackrel{(Oi-Ei)2}{Ei}$

And I tried to do that with the latex, but I suck at it, so in case it's illegible, chi2=sum(from 1-->n)(Oi-Ei)^2/Ei

Where E is the theoretical output and O is the experimental output. Basically, this is looking only at the experimental and theoretical values so I thought it was perfect, right? And I got a value of 2.007. Yay!

But wait. What is this, 2.007. What in the world do I do with it? So there are some graphs with some lines on wikipedia that talk about degrees of freedom and are very confusing... and I don't know how that relates to this at all.
http://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Goodness_of_fit
I know a few things from my undergrad days about chi2 in general... I know that it represents the amount of dispersion that the experimental has from the theoretical. But I don't know what this says about anything or what to do with this. I'm just really lost on all of this. Any help, explained to me on a fundamental level like I'm an undergrad or a high-school student, would be really appreciated.

u/FactChecker; I'm missing how the linear model compares to my model, if you could explain a little more I would really appreciate it. I'm not sure where to go with your comment.

7. Mar 28, 2014

### Stephen Tashi

That goodness of fit test assumes you have divided the into discrete "bins". Since you have continuous variates, to define a bin you'd have to specify each bin by giving intervals for the data. For example, a bin might be the set { 1.7 < x < 2.9, 0.38 < y < 9.6 }. So unless there is some "natural" way to bin your results, that method doesn't apply. )You should note the comments in the article that say not to have too few counts in bins.)

You might need a "sociological" approach to this statistical problem. Browse the Journals where your paper will be submitted and see what authors do when they compare theoretical to experimental curves - especially look at those papers where the authors don't do an elaborate job. Whatever words they say will give you a hint about what to do.

In academia, authors of papers usually respond to questions about their work from other academics. Consider contacting the authors of the experimental results and asking them about the precision of the equipment they used.

Is it correct that you have already determined values for 'a' and 'b' in the equation? If so, what method did you use to do that? Are 'a' and 'b' given from theory?