- #1
lavoisier
- 177
- 24
I wonder if someone can please help me understand why a nonlinear regression I'm attempting doesn't work, and suggest how I can tackle it.
It's based on this equation:
[itex](1-y) \cdot (D \cdot y+K) = n \cdot x \cdot y[/itex]
where x is the independent variable, a real number usually between 6400 and 640000, and y is the dependent variable, a real number ∈ (0,1).
D, K and n are real constants. D is known and is usually 1000.
K and n are the ones we want to estimate. K is a real positive number; n is also positive and is usually expected to be a small integer (1, 2 or 3 most of the time). But we accept that n may be non-integer.
The book I took the theory from says the equation can be linearised, so one can run the usual least squares fit.
Here's the equation they suggest:
[itex]\frac {x} {1-y} = \frac {K} {n} \cdot \frac {1} {y} + \frac {D} {n} [/itex]
Of course the idea is to plot the LHS vs 1/y, find a line fitting the points, and back-calculate K and n from the slope, intercept and D.
However, when I tried this on my data, in some cases I got a negative value of the intercept, which gave a negative n, and a negative K, as the slope was always positive.
This is physically impossible, given the theory from which the initial equation is derived.
So I'm guessing there is something wrong with the regression (or with the data, or both?).
Then I remembered reading that linearisation-based methods are not so good, as they cause problems with the distribution of the error, or something to that effect, and that nonlinear methods are often preferable.
So I solved the initial equation for y, plotted the experimental y vs the calculated y and tried (using Excel's solver) to find the values of K and n that gave the smallest sum of squared differences between the two.
Unfortunately this gave even worse results than the linear method!
Despite the sum of squares did reach a minimum as required, n became extremely large, and so did K. Again, the theory does not agree with that: n should be quite small and probably an integer.
I also noticed that, by setting n to a small value (say 1 or 2) and fitting K alone, the result was not too bad, but I could put any value for n, and K was always found with a satisfactory sum of squares. Same if I used a fixed reasonable value for K and fitted n alone.
So basically it looked like I could decide myself the value of K or n, which sounds absurd.
Further analysis revealed that in the linearised equation, the term D/n was usually several orders of magnitude smaller than K/(n y), so I guess small experimental errors in the measurement of y can have an enormous effect on the intercept and cause a really bad estimation of n.
But for the nonlinear method, I have no idea why it shouldn't work.
Does anyone know why this is happening?
And can you please suggest what I should do to estimate my parameters K and n more 'reliably'?
Thank you!
L
PS
Here's some simulated data, calculated using K=5000, D=1000, n=1.
[x=640000,y=0.007763881556107663],[x=256000.0,y=0.019229347046974],[x=64000.0,y=0.07345007480164349],[x=6400.0,y=
0.4603886792339626]
The real data I have are often extremely close to the simulated ones, with very small relative errors, but the estimation of K and n as independent parameters suffers from the problem I described above.
Here's some real data for 4 separate experiments y1, y2, y3 and y4 (sorry, poor formatting, I don't know how to paste tables):
x y1 y2 y3 y4
640000 0.003 0.058 0.009 0.007
256000 0.011 0.133 0.022 0.014
64000 0.028 0.371 0.065 0.047
6400 0.224 0.856 0.51 0.339
It's based on this equation:
[itex](1-y) \cdot (D \cdot y+K) = n \cdot x \cdot y[/itex]
where x is the independent variable, a real number usually between 6400 and 640000, and y is the dependent variable, a real number ∈ (0,1).
D, K and n are real constants. D is known and is usually 1000.
K and n are the ones we want to estimate. K is a real positive number; n is also positive and is usually expected to be a small integer (1, 2 or 3 most of the time). But we accept that n may be non-integer.
The book I took the theory from says the equation can be linearised, so one can run the usual least squares fit.
Here's the equation they suggest:
[itex]\frac {x} {1-y} = \frac {K} {n} \cdot \frac {1} {y} + \frac {D} {n} [/itex]
Of course the idea is to plot the LHS vs 1/y, find a line fitting the points, and back-calculate K and n from the slope, intercept and D.
However, when I tried this on my data, in some cases I got a negative value of the intercept, which gave a negative n, and a negative K, as the slope was always positive.
This is physically impossible, given the theory from which the initial equation is derived.
So I'm guessing there is something wrong with the regression (or with the data, or both?).
Then I remembered reading that linearisation-based methods are not so good, as they cause problems with the distribution of the error, or something to that effect, and that nonlinear methods are often preferable.
So I solved the initial equation for y, plotted the experimental y vs the calculated y and tried (using Excel's solver) to find the values of K and n that gave the smallest sum of squared differences between the two.
Unfortunately this gave even worse results than the linear method!
Despite the sum of squares did reach a minimum as required, n became extremely large, and so did K. Again, the theory does not agree with that: n should be quite small and probably an integer.
I also noticed that, by setting n to a small value (say 1 or 2) and fitting K alone, the result was not too bad, but I could put any value for n, and K was always found with a satisfactory sum of squares. Same if I used a fixed reasonable value for K and fitted n alone.
So basically it looked like I could decide myself the value of K or n, which sounds absurd.
Further analysis revealed that in the linearised equation, the term D/n was usually several orders of magnitude smaller than K/(n y), so I guess small experimental errors in the measurement of y can have an enormous effect on the intercept and cause a really bad estimation of n.
But for the nonlinear method, I have no idea why it shouldn't work.
Does anyone know why this is happening?
And can you please suggest what I should do to estimate my parameters K and n more 'reliably'?
Thank you!
L
PS
Here's some simulated data, calculated using K=5000, D=1000, n=1.
[x=640000,y=0.007763881556107663],[x=256000.0,y=0.019229347046974],[x=64000.0,y=0.07345007480164349],[x=6400.0,y=
0.4603886792339626]
The real data I have are often extremely close to the simulated ones, with very small relative errors, but the estimation of K and n as independent parameters suffers from the problem I described above.
Here's some real data for 4 separate experiments y1, y2, y3 and y4 (sorry, poor formatting, I don't know how to paste tables):
x y1 y2 y3 y4
640000 0.003 0.058 0.009 0.007
256000 0.011 0.133 0.022 0.014
64000 0.028 0.371 0.065 0.047
6400 0.224 0.856 0.51 0.339