Underdetermined vs Overdetermined Systems

In summary, the speaker is trying to fit a model with unknown parameters to two different data sets, one overdetermined and one underdetermined. They are concerned about overfitting and the resulting reliability and accuracy of the results. They mention using regularization procedures and are unsure if regularized regression or partial least squares is better for the underdetermined system. The speaker also discusses the importance of finding a parameterization of the model with as few fit-dependent parameters as possible to avoid overfitting. They suggest using both data sets to fit the parameters and checking the model's reliability by fitting it to a subset of data. In general, they believe that a under-fitted model with less parameters is more trustworthy than an over-fitted model with more
  • #1
CoSurShe
2
0
I'm trying to create a model which is of the form

y = (a0 + a1l)[b0+MΣm=1 bmcos(mx-αm)] [c0 + NΣn=1 cn cos(nz-βn)]

In the above system, l,x and z are independent variables and y is the dependent variable. The a, b and c terms are the unknowns. To solve for these unknowns, I have two separate data sets that I can use. Using data set 1 creates an overdetermined system providing me with more observations than unknowns, while data set 2 creates an underdetermined system with less observations than unknowns. In such a case, which approach would be better - underdetermined or overdetermined? and Why?
 
Physics news on Phys.org
  • #2
Neither is very good! Which is "better" depends upon what you want to do and what you mean by "better". The "underdetermined" system allows for an infinite number of "solutions" but you can determine a subset (actually subspace) of all possible combinations that is the set of all combinations that exactly satisfy the system. The "overdetermined" system do not gave any correct solution but you can determine the unique solution that comes closest to the satisfying the system in the least squares sense.
 
  • #3
Thanks for the reply. I need to fit the model described above to either of the two available sets of data and use the residue to perform a separate set of analyses. The real concern is that I have too many variables to fit and I fear overfitting and hence the resulting (un)reliability and accuracy of the results. I am aware of the regularization procedures and other steps to mitigate overfitting. I am not certain if a regularized regression technique or partial least squares method of finding the coefficients for an underdetermined system is better suited for removing the trend described by the model above when compared to using the same model with the other data set which would be using a regularized overdetermined system?
 
  • #4
One thing you could do, in principle, is to use both data sets to fit the parameters. Then you have an even more over-determined data set than with data set 2 alone. Is this a good idea? Hard to say.

In general, what you *want* to achieve is to find a parameterization of your model which needs as few fit-dependent parameters as possible (for example, by fixing parameters or functional forms using asymptotic expansions, known constraints, symmetry, etc.). You could then either (a) check if the model is reasonable by fitting it to a subset of data, and checking if it reasonably reproduces the other data, and (b) if this works, use all data you have for least-squares fitting (or maximum likelyhood-fitting or whatever you like) the parameters model, to extend its range of applicability as far as possible.

Basically, the more parameters you need to fit, the more susceptible your system becomes to overfitting, and thus becoming unreliably (and possibly erratic) as soon as you step outside the range of data which was not included in the data set. If in doubt, I would always consider a under-fitted model with less parameters, which reasonably reproduces larger data sets, as more trustworthy than a over-fitted model which more closely reproduces the data set it was fitted on. Some of the most successful models in all of physics (e.g., http://dx.doi.org/10.1063/1.464913) achieved their success mainly because they had few parameters and thus little room for overfitting---which increased their applicability even beyond the originally envisioned applications.
 

1. What is the difference between underdetermined and overdetermined systems?

Underdetermined systems have fewer equations than unknown variables, while overdetermined systems have more equations than unknown variables. This means that underdetermined systems have an infinite number of solutions, while overdetermined systems have either no solution or a unique solution.

2. How do you solve underdetermined systems?

Underdetermined systems can be solved by using additional information or constraints to narrow down the infinite number of solutions to a unique solution. This can be done through techniques such as least squares or using a specific method tailored to the specific system.

3. What are some real-life examples of underdetermined and overdetermined systems?

Underdetermined systems can be found in situations where there are more unknowns than equations, such as in some optimization problems. Overdetermined systems can be found in situations where there are more equations than unknowns, such as in the analysis of data with more variables than observations.

4. How do you know if a system is underdetermined or overdetermined?

A system is underdetermined if it has fewer equations than unknown variables, and it is overdetermined if it has more equations than unknown variables. This can be determined by writing out the system of equations and comparing the number of equations to the number of unknown variables.

5. Can a system be both underdetermined and overdetermined?

No, a system can only be either underdetermined or overdetermined. If a system has the same number of equations as unknown variables, it is considered a well-determined system. It is important to note that a system can switch between being underdetermined and overdetermined depending on the number of known variables or equations.

Similar threads

  • Linear and Abstract Algebra
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
8K
Replies
1
Views
591
  • Calculus and Beyond Homework Help
Replies
1
Views
4K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Back
Top