- #1

digfarenough

- 21

- 0

First, my the errors in my n data points around the fit are not i.i.d. Gaussian (the data have an artifact that the models will not able to fit easily and the artifact introduces dependencies and a non-Normal distribution of errors), so I worry about using the sum of the square residuals (RSS) divided by n as the likelihood. I recall reading somewhere that using RSS/n as likelihood in AICc can still produce reasonable results even if the errors are not Gaussian, though I think they were still independent in that case.

An alternative came to mind, though, but I can't get my thinking straight on it.

The fits often hit local minima so I do repeated non-linear least squares fits from random start points. Some of the models will overfit the data, and in these some of the parameters will vary wildly as one unnecessary parameter takes on extreme values that are balanced by extreme values from another unnecessary parameter.

[As a sidenote: I may misunderstanding likelihood in the context of a fit: Likelihood is the probability of an observed outcome given certain parameters, but in a fit the resulting shape is entirely determined by the parameters, so the likelihood would seem to be 1. However, since likelihood is a function of a statistical model, perhaps I should think of the fitting itself as being part of the process. Then my data would become the "parameters" of the statistical fitting process, which produces a distribution over the estimated parameters of the model, playing the part of the "outcome" in the definition of likelihood.]

That probably makes no sense, so: By considering the vectors of estimated parameters from all the repetitions of the fit, I get points sampled from a distribution in parameter space. This is sort of an empirical estimate of likelihood that shows whether the repeated fittings tend to cluster in one part of parameter space or whether they are all spread out in space. The former would suggest the model in question is a good fit and the latter that the model is overfitting. One of these points has the lowest RSS and so is the best fit. This suggests to me I can use the estimated parameter distribution density at the point with the lowest RSS as the likelihood of the fit for the AICc computation.

Am I all mixed up here? Is this a bad idea? I still can't quite figure out how to go from a cloud of points in parameter space to a single value of likelihood to plug into the AICc expression, though. I considered using multivariate kernel density estimation to smooth the discrete distribution out, but I worry I'm making things more and more complicated.

Could it be better to simply throw out models where the best fit had very large confidence intervals and only use those with reasonable intervals in the AICc comparison, using RSS/n as the likelihood?

Any input would be appreciated. I have searched at length online but can't seem to find the appropriate search terms to give me the answers I need, and I am not well versed in this aspect of statistics.