Mathematica [Mathematica] FindFit/NonlinearModelFit with non-gaussian residuals

  • Thread starter Thread starter FunkyDwarf
  • Start date Start date
  • Tags Tags
    mathematica
Click For Summary
The discussion centers on the assumptions and implications of using nonlinear transformations in model fitting, particularly in the context of NonlinearModelFit and FindFit. It highlights that NonlinearModelFit assumes normally distributed residuals for maximum likelihood estimation and confidence interval construction, while FindFit lacks explicit mention of this assumption. The conversation raises concerns about how nonlinear transformations can alter parameter estimates and the sum of squared residuals (SSR), suggesting that different transformations may lead to different 'best fit' parameters. The participants emphasize the importance of understanding the underlying mathematical assumptions, such as the distribution of residuals and homoscedasticity, when applying fitting methods. They express a desire for clearer documentation that outlines these assumptions to avoid confusion in statistical modeling. The discussion concludes with a recognition of the complexity involved in choosing appropriate transformations and the need for a more comprehensive approach to fitting models that considers these factors.
FunkyDwarf
Messages
481
Reaction score
0
Hi,

I note that in the 'more info' for NonlinearModelFit it says that it assumes the values are normally distributed around the mean response function y, which I understand is required if one wants to use maximum likelihood methods and construct confidence intervals etc.

However, there appears to be no such mention in FindFit, and my understanding (which may be way off) is that Gaussian residuals isn't so important if you want to estimate parameters, only if you want to do confidence/inference stuff.

Is this correct? If so, why when I transform my function (and data), do a fit and then transform back, do i get different parameter values compared to just fitting the 'naked' untransformed model and data? Is this due to some artefact of the algorithm (in this case NMinimize) being used, or is it a deeper issue? Is there not a one to one mapping of the sum of the squared residuals, and the parameters that minimize them?

Thanks in advance!
 
Physics news on Phys.org
I always assume that documentation may contain errors and/or omissions. Perhaps this explains the difference in the documentation you describe.

If you do not care how accurate an estimate is then I suppose it doesn't matter what methods are used or assumptions are required, just say all the estimates are about zero.

If you do a non-linear transformation then all the errors between the measured point and the unknown model will be changed and any estimation process will use those changed values.

Imagine your model is y ~= 100 and your measured data points lie between 1 and 1000. If you average all your data the errors lie between +900 and -99. But if you do a log10 transform on your data the errors lie between +1 and -2. One attempt or the other, or perhaps both, are going to make some very questionable calculations with those errors.

Try generating a few uniformly distributed random numbers between 1 and 1000. Take the mean. Compare that with taking the log of each point, take the mean and take the antilog. Sometimes the results are close. Sometimes they are not. But that simple example can show some of what is happening with transforms.

As you noted, there is a great deal of mathematics and assumptions behind the scenes that are often not appropriately explained when dealing with fitting models and dealing with errors.
 
Hi Bill,

Thanks for the response. I suspected as much, specifically that nonlinear transforms are going to cause headaches, but it was not so much the individual values, or even certain statistics that i was after, but more the general approach of say least squares. For example, it seems that if you minimize the sum of the squared residuals (SSR) for the original function, and the transformed function, the 'best fit' parameters are different i.e. the minima occur at different points.

After thinking about it a bit this is perhaps not so strange since you can always concoct some weird nonlinear transform that squishes your function and data in different ways so that your notion of 'distance' between function and data isn't conserved across your set when you move from one function to the transformed function.

I guess what I'm getting at is it seems intuitive that minimizing your SSR is a reasonable metric by which to determine your best fit, but it seems arbitrary when you consider the number of transforms you could perform (of course most aren't sensible). Is there any 'global' approach one can use? Presumably this would bring us back to maximum likelihood and all it's baggage?

Thanks again!
 
Minimizing your SSR may seem reasonable, but that probably depends on many unstated assumptions, like the residuals having a symmetric and perhaps even gaussian distribution, like homoscedasticity, like assuming you have a parametric statistical problem, as opposed to a non-parametric problem. I suspect the list might even be longer. All those things seem to be ignored when the mechanical process of grinding out a sum of squares is introduced.

You should certainly verify this from an authority, but I think I recall that THE justified and acceptable transformation is the one that gives a gaussian distribution of the residuals and homoscedasticity.

I've wished I could find a stats text which would start with the usually unstated requirements, clearly explain why those were the case and then proceed to the theorem that would use all this.
 
Me too =)
 
Buried here somewhere, which I'll never find again, is an intro stats text which is oriented around teaching students to "eyeball the data" and then be able to estimate the statistics with a good deal of precision, make decisions with a good degree of confidence, etc. A couple of Google searches don't find the title. It was cute enough when I saw it on a college bookstore shelf that I bought a copy and meant to try that.

But getting the assumptions out in front of stats would be more important.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 2 ·
Replies
2
Views
9K
  • · Replies 4 ·
Replies
4
Views
10K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K