Underdetermined vs Overdetermined Systems

Click For Summary
SUMMARY

This discussion focuses on the comparison between underdetermined and overdetermined systems in modeling, specifically in the context of fitting a complex model defined by the equation y = (a0 + a1l)[b0+MΣm=1 bmcos(mx-αm)] [c0 + NΣn=1 cn cos(nz-βn)]. The author highlights that using data set 1 results in an overdetermined system, while data set 2 leads to an underdetermined system. The choice between these systems depends on the modeling goals, with underdetermined systems allowing for infinite solutions and overdetermined systems providing a unique least squares solution. The discussion emphasizes the importance of avoiding overfitting by considering regularization techniques and the potential benefits of combining both data sets for parameter fitting.

PREREQUISITES
  • Understanding of linear regression and least squares fitting
  • Familiarity with regularization techniques in statistical modeling
  • Knowledge of parameter estimation in mathematical modeling
  • Basic concepts of underdetermined and overdetermined systems
NEXT STEPS
  • Research regularization techniques such as Lasso and Ridge regression
  • Explore the principles of partial least squares regression
  • Study the implications of overfitting in statistical models
  • Investigate methods for combining multiple data sets for parameter fitting
USEFUL FOR

Researchers, data scientists, and statisticians involved in modeling complex systems, particularly those concerned with parameter fitting and overfitting issues in their analyses.

CoSurShe
Messages
2
Reaction score
0
I'm trying to create a model which is of the form

y = (a0 + a1l)[b0+MΣm=1 bmcos(mx-αm)] [c0 + NΣn=1 cn cos(nz-βn)]

In the above system, l,x and z are independent variables and y is the dependent variable. The a, b and c terms are the unknowns. To solve for these unknowns, I have two separate data sets that I can use. Using data set 1 creates an overdetermined system providing me with more observations than unknowns, while data set 2 creates an underdetermined system with less observations than unknowns. In such a case, which approach would be better - underdetermined or overdetermined? and Why?
 
Physics news on Phys.org
Neither is very good! Which is "better" depends upon what you want to do and what you mean by "better". The "underdetermined" system allows for an infinite number of "solutions" but you can determine a subset (actually subspace) of all possible combinations that is the set of all combinations that exactly satisfy the system. The "overdetermined" system do not gave any correct solution but you can determine the unique solution that comes closest to the satisfying the system in the least squares sense.
 
Thanks for the reply. I need to fit the model described above to either of the two available sets of data and use the residue to perform a separate set of analyses. The real concern is that I have too many variables to fit and I fear overfitting and hence the resulting (un)reliability and accuracy of the results. I am aware of the regularization procedures and other steps to mitigate overfitting. I am not certain if a regularized regression technique or partial least squares method of finding the coefficients for an underdetermined system is better suited for removing the trend described by the model above when compared to using the same model with the other data set which would be using a regularized overdetermined system?
 
One thing you could do, in principle, is to use both data sets to fit the parameters. Then you have an even more over-determined data set than with data set 2 alone. Is this a good idea? Hard to say.

In general, what you *want* to achieve is to find a parameterization of your model which needs as few fit-dependent parameters as possible (for example, by fixing parameters or functional forms using asymptotic expansions, known constraints, symmetry, etc.). You could then either (a) check if the model is reasonable by fitting it to a subset of data, and checking if it reasonably reproduces the other data, and (b) if this works, use all data you have for least-squares fitting (or maximum likelyhood-fitting or whatever you like) the parameters model, to extend its range of applicability as far as possible.

Basically, the more parameters you need to fit, the more susceptible your system becomes to overfitting, and thus becoming unreliably (and possibly erratic) as soon as you step outside the range of data which was not included in the data set. If in doubt, I would always consider a under-fitted model with less parameters, which reasonably reproduces larger data sets, as more trustworthy than a over-fitted model which more closely reproduces the data set it was fitted on. Some of the most successful models in all of physics (e.g., http://dx.doi.org/10.1063/1.464913) achieved their success mainly because they had few parameters and thus little room for overfitting---which increased their applicability even beyond the originally envisioned applications.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K