Using SVD to determine the redundancy of a fit.

  • Thread starter WarPhalange
  • Start date
  • Tags
    Fit Svd
In summary, SVD (Singular Value Decomposition) is a mathematical method used to determine the redundancy of a fit, or how much information is duplicated in a data set. It breaks down the data into its essential components, allowing for the identification of redundant features or variables. This can be useful in various fields, such as data analysis, machine learning, and signal processing, to improve the efficiency and accuracy of models and algorithms.
  • #1
WarPhalange
I have a project I am doing for a professor and unfortunately I cannot get ahold of him to help me out, so I figured I'd ask you guys. Of course, I tried to ask The Google about this first and didn't get anywhere. Here is what I am trying to do:

assignment said:
Linear least squares fitting. Choose some odd-ball function, say g(x). Create a set of "data" by choosing xi and some (sigma)i, generating yi normally distributed about g(xi). Choose a set of functions that might plausibly fit the data as y=Sum[aj fj(x)]. Perform a least squares fit by solving the normal equations in matrix form. You should determine the condition number and use SVD to determine whether there is any redundancy in your choice of fj(x) and fix the fit. Finally should evaluate chi squared to see whether the fit is adequate.

The part I am having trouble with is using SVD to determine redundancy. My f(x) is an n'th order polynomial (I get to decide what 'n' is). I successfully found a fit to my generated data, but don't know where to go from there.

What I found online was to take the matrix A from Ax=b, do SVD on that, and go from there in order to solve the system of equations. I already have a solution though, so I don't know what to do.

One idea I had was to make a new matrix like so:

| a0 a1x1 a2x12 a3x13 |
| a0 a1x2 a2x22 a3x23 |
| a0 a1x3 a2x32 a3x33 |

et cetera, with actual numbers instead of 'a' and 'x' of course, take the SVD of that, trim down the three matrices to only include actual eigenvalues (so drop any '0' elements in the diagonal matrix), transform back, and then divide by the various 'x's to get different values for the 'a's, but I'm not sure if that would do anything at all.

Thanks in advance for the help.
 
Physics news on Phys.org
  • #2

Thank you for reaching out for help with your project. It sounds like you have already made some good progress in fitting your data and are now trying to use SVD to determine redundancy in your choice of functions. SVD can definitely be a useful tool for this task, and I can offer some guidance on how to use it in your case.

First, let's review the concept of redundancy in this context. In a least squares fit, we want to find the best set of coefficients (in your case, the 'a' values) that minimize the difference between our model (the sum of functions fj(x)) and the actual data. However, if we have chosen a set of functions that are linearly dependent (meaning one function can be expressed as a linear combination of the others), then some of the coefficients will be redundant - they will not contribute to the fit since they can be replaced by other coefficients without changing the overall model. This can lead to issues with the fit, such as overfitting or instability.

Now, onto using SVD to determine redundancy. You are correct in thinking that you can take the matrix A from Ax=b and perform SVD on it. This will give you three matrices: U, Σ, and V. The Σ matrix contains the singular values, which are essentially the scaling factors for the columns of U and rows of V. The larger the singular value, the more important that column or row is in the matrix.

To determine redundancy, we can look at the singular values in Σ. If any of them are very small (close to 0), then those columns of U and rows of V are not contributing much to the overall matrix and can be considered redundant. In your case, this would mean that some of your chosen functions are not necessary for the fit.

To address this redundancy, you can follow your idea of creating a new matrix with 'x' values and then performing SVD on that. This will essentially reduce the dimensionality of your problem and allow you to eliminate the redundant functions.

I hope this helps guide you in using SVD to determine redundancy in your least squares fit. If you have any further questions or need clarification, please don't hesitate to ask. Good luck with your project!
 

1. What is SVD and how does it determine redundancy in a fit?

SVD stands for singular value decomposition, which is a mathematical method used to factor a matrix into three separate matrices. It is commonly used in data analysis and machine learning to identify patterns and relationships in data. In terms of determining redundancy in a fit, SVD can be used to identify correlated variables or features in a dataset.

2. How does SVD differ from other methods of identifying redundancy?

Other methods of identifying redundancy, such as correlation analysis and principal component analysis, also use mathematical techniques to identify patterns in data. However, SVD is unique in that it can handle highly complex and multidimensional datasets, whereas other methods may struggle with these types of data.

3. Can SVD be used to determine the optimal number of features in a fit?

Yes, SVD can be used to determine the optimal number of features in a fit by looking at the singular values of the decomposed matrix. The larger the singular value, the more important the corresponding feature is in explaining the variability in the data. Therefore, by looking at the magnitude of the singular values, one can determine the most important features to include in a fit.

4. Are there any limitations to using SVD for determining redundancy in a fit?

Like any statistical method, SVD has its limitations. One potential limitation is that it assumes a linear relationship between variables, so it may not be suitable for non-linear data. Additionally, SVD can be computationally intensive for large datasets, so it may not be feasible to use on extremely large datasets.

5. How can the results of SVD be interpreted to make decisions about a fit?

The results of SVD can be interpreted by looking at the singular values and corresponding features. By selecting the features with the largest singular values, one can determine the most important variables in a fit. Additionally, the relationships between features can be examined to identify any potential redundancies or correlations. This information can then be used to make decisions about which features to include in a fit or to simplify the model by removing redundant features.

Similar threads

  • Linear and Abstract Algebra
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Linear and Abstract Algebra
Replies
1
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
1K
  • Calculus and Beyond Homework Help
Replies
9
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
895
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
839
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
Back
Top