Underdetermined vs Overdetermined Systems

Click For Summary

Discussion Overview

The discussion revolves around the comparison of underdetermined and overdetermined systems in the context of fitting a model to data sets. Participants explore the implications of each system type on model fitting, parameter estimation, and concerns about overfitting.

Discussion Character

  • Debate/contested
  • Mathematical reasoning
  • Technical explanation

Main Points Raised

  • One participant describes a model involving multiple independent variables and unknown parameters, highlighting the distinction between using an overdetermined system with more observations than unknowns and an underdetermined system with fewer observations.
  • Another participant argues that the choice between underdetermined and overdetermined systems depends on the specific goals of the analysis, noting that underdetermined systems allow for infinite solutions while overdetermined systems can yield a unique least squares solution.
  • A concern is raised about overfitting due to having too many variables, with a participant expressing uncertainty about whether to use regularized regression techniques for the underdetermined system or the overdetermined system.
  • One suggestion involves using both data sets to create an even more overdetermined system, though the efficacy of this approach is questioned.
  • Participants discuss the importance of minimizing the number of fit-dependent parameters to avoid overfitting, suggesting that simpler models may be more reliable even if they do not fit the data as closely.

Areas of Agreement / Disagreement

Participants do not reach a consensus on which system is better for fitting the model, as opinions vary based on the context and specific goals of the analysis. The discussion remains unresolved regarding the optimal approach to take.

Contextual Notes

Participants express concerns about overfitting and the reliability of results when fitting models with many parameters. There is also mention of regularization techniques and the potential trade-offs between model complexity and data fitting.

CoSurShe
Messages
2
Reaction score
0
I'm trying to create a model which is of the form

y = (a0 + a1l)[b0+MΣm=1 bmcos(mx-αm)] [c0 + NΣn=1 cn cos(nz-βn)]

In the above system, l,x and z are independent variables and y is the dependent variable. The a, b and c terms are the unknowns. To solve for these unknowns, I have two separate data sets that I can use. Using data set 1 creates an overdetermined system providing me with more observations than unknowns, while data set 2 creates an underdetermined system with less observations than unknowns. In such a case, which approach would be better - underdetermined or overdetermined? and Why?
 
Physics news on Phys.org
Neither is very good! Which is "better" depends upon what you want to do and what you mean by "better". The "underdetermined" system allows for an infinite number of "solutions" but you can determine a subset (actually subspace) of all possible combinations that is the set of all combinations that exactly satisfy the system. The "overdetermined" system do not gave any correct solution but you can determine the unique solution that comes closest to the satisfying the system in the least squares sense.
 
Thanks for the reply. I need to fit the model described above to either of the two available sets of data and use the residue to perform a separate set of analyses. The real concern is that I have too many variables to fit and I fear overfitting and hence the resulting (un)reliability and accuracy of the results. I am aware of the regularization procedures and other steps to mitigate overfitting. I am not certain if a regularized regression technique or partial least squares method of finding the coefficients for an underdetermined system is better suited for removing the trend described by the model above when compared to using the same model with the other data set which would be using a regularized overdetermined system?
 
One thing you could do, in principle, is to use both data sets to fit the parameters. Then you have an even more over-determined data set than with data set 2 alone. Is this a good idea? Hard to say.

In general, what you *want* to achieve is to find a parameterization of your model which needs as few fit-dependent parameters as possible (for example, by fixing parameters or functional forms using asymptotic expansions, known constraints, symmetry, etc.). You could then either (a) check if the model is reasonable by fitting it to a subset of data, and checking if it reasonably reproduces the other data, and (b) if this works, use all data you have for least-squares fitting (or maximum likelyhood-fitting or whatever you like) the parameters model, to extend its range of applicability as far as possible.

Basically, the more parameters you need to fit, the more susceptible your system becomes to overfitting, and thus becoming unreliably (and possibly erratic) as soon as you step outside the range of data which was not included in the data set. If in doubt, I would always consider a under-fitted model with less parameters, which reasonably reproduces larger data sets, as more trustworthy than a over-fitted model which more closely reproduces the data set it was fitted on. Some of the most successful models in all of physics (e.g., http://dx.doi.org/10.1063/1.464913) achieved their success mainly because they had few parameters and thus little room for overfitting---which increased their applicability even beyond the originally envisioned applications.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
10
Views
9K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K