Determine optimal vectors for least squares

Pete99
Messages
42
Reaction score
0
Hello all,

I have a set of measurements that I want to fit to a linear model with a bunch of parameters. I do this as \theta=(H^TH)^{-1}H^Tx, where θ are the parameters and x is the set of measurements. The problem is that I'd like to reduce the number of parameters in the fit. I'd like to chose the subset of N parameters that gives the best fit, such that no other combination of N parameters works better.

Is there any way I can determine what is the best subset of N parameters without having to try all of them? I have seen that with order recursive least squares I can add parameters sequentially to improve the fit, but this approach does not guarantee that the N parameters that I have selected are the best combination.

Thank you very much for any help,
 
Physics news on Phys.org
Pete99 said:
Hello all,

I have a set of measurements that I want to fit to a linear model with a bunch of parameters. I do this as \theta=(H^TH)^{-1}H^Tx, where θ are the parameters and x is the set of measurements. The problem is that I'd like to reduce the number of parameters in the fit. I'd like to chose the subset of N parameters that gives the best fit, such that no other combination of N parameters works better.

Is there any way I can determine what is the best subset of N parameters without having to try all of them? I have seen that with order recursive least squares I can add parameters sequentially to improve the fit, but this approach does not guarantee that the N parameters that I have selected are the best combination.

Thank you very much for any help,

Hey Pete99.

If you want to choose the bit fit for say N parameters where N <= x but greater than zero, then it sounds like you need to find a way to either project your inputs down to an appropriate sub-space and do least-squares on that, or to do the other thing which is to do least squares and then project the result down from your calculated parameters down to a reduced form.

In other words, this boils down to taking your vector x and projecting it down to some sub-space in the sam way we project say an arbitrary point in R^3 on to a plane that is two-dimensional.

The thing you will have to figure out in terms of the nitty gritty is the actual projection itself and this will depend on specifically what you call an 'optimal' configuration of parameters.

I would start off thinking about doing the least squares and then projecting your parameters down to some sub-space instead of taking your vector and projecting that down before you do least squares.

If you are trying to fit a linear model to data like you would in a statistical analysis though (like a regression), I would not do this method but instead use what is called a Principle Components Analysis.

PCA is a very old technique and well understood and comes as a feature or a library in manys statistical applications. It works by creating a basis of un-correlated variables in the order of the basis vectors that contribute the most variance up until the least amount of variance.

Thus if you want a model for N parameters, you pick the first N basis components of the PCA output and use these basis vectors as your regression model.

I'd strongly recommend you think about using PCA if you are trying to fit some multi-dimensional linear model because the calculation is very quick and you can fit the linear model just as quick and see how good the fit is for yourself.

In R it should take about say 30 minutes to an hour if you are familiar with R and more if you are not, but if you are familiar with the major packages then you could probably just read the documentation for this.
 
Thank you chiro for your your detailed response.

Sorry for the notation if it is not very rigorous, and correct me if something I say is directly wrong. I guess I forgot to mention that the vectors (columns from H) that I want to use are already defined and have some physical meaning. As far as I know PCA does not allow me to use these vectors but the idea is exactly what I want to do.

I would like to chose the a subset of N columns from the matrix H (say H_N), such that no other subset of N columns from the matrix H gives a better fit in the least squares sense.

In PCA I would get a set of orthonormal vectors (v_1,v_2,\dots) that I can use to do my fit. And since they are orthogonal, if H_1=v_1 is the best "single-vector" that I can use to fit my data, the best "two-vectors" to represent the data will be H_2=[v_1,v_2], etc...

In my case, since the vectors in H are not orthogonal, assuming that the best "single-vector" is H=h_1, there is no guarantee that the best "two-vectors" will contain h_1, but they can be any other two vectors (for instance H_2=[h_3,h_{24}]). My problem is that I don't know how to chose these two vectors unless I try all possible combinations of two vectors.
 
So what is the criteria exactly? Do you want to say rank some variables over another in the selection process? So for example you always want a model that capture a particular kind of variable even if it doesn't contribute much to the actual regression model?

Also what you can do is to take a variable out when you do the PCA and see what it produces and then look at what has been calculated as part of the output components.

Also there are routines that do find the best fit of variables for a regression given N variables that are exhaustive in contrast to the PCA approach. You might want to look at things like say the step() routine in R and other similar kinds of routines.
 
So what is the criteria exactly? Do you want to say rank some variables over another in the selection process? So for example you always want a model that capture a particular kind of variable even if it doesn't contribute much to the actual regression model?

Not exactly. I do want to use the variables that contribute the most to the actual fit. But I want to chose the parameters from a set of parameters that have some physical meaning in my problem.

Let's say that my model has three physical parameters that contribute to the output as x=[h_1 h_2 h_3][\theta_1 \theta_2 \theta_2]^T, where h are column vectors and theta are the parameters.

Say that my measurement is the vector x=[1, 1, 0]^T. And that the h vectors in my model are h_1=[1, 0, 0]^T, h_2=[0, 1, 0]^T, and h_3=[0.9, 0.9, 0.1]^T.

If I want to use only 1 parameter from the 3 possible parameters that I have, I would chose \theta_3, because the vector h_3 is very close to x. This is very easy to find, because I just have to try the three possibilities. However, if I want to use two parameters, the best choice will be \theta_1 and \theta_2, since the vector x is in the plane formed by h_1 and h_2.

In my problem I have ~25 parameters, and I would like to use no more than ~10 to fit the data (because of restrictions on the processing that I have to do later). My problem is, how can I chose the 8 parameters from the total of 25 parameters that will provide the best fit to my data in the least squares sense.

I am not familiar with R, so I am not sure what step() does, but I will take a look to see if it can help me.
 
##\textbf{Exercise 10}:## I came across the following solution online: Questions: 1. When the author states in "that ring (not sure if he is referring to ##R## or ##R/\mathfrak{p}##, but I am guessing the later) ##x_n x_{n+1}=0## for all odd $n$ and ##x_{n+1}## is invertible, so that ##x_n=0##" 2. How does ##x_nx_{n+1}=0## implies that ##x_{n+1}## is invertible and ##x_n=0##. I mean if the quotient ring ##R/\mathfrak{p}## is an integral domain, and ##x_{n+1}## is invertible then...
The following are taken from the two sources, 1) from this online page and the book An Introduction to Module Theory by: Ibrahim Assem, Flavio U. Coelho. In the Abelian Categories chapter in the module theory text on page 157, right after presenting IV.2.21 Definition, the authors states "Image and coimage may or may not exist, but if they do, then they are unique up to isomorphism (because so are kernels and cokernels). Also in the reference url page above, the authors present two...
When decomposing a representation ##\rho## of a finite group ##G## into irreducible representations, we can find the number of times the representation contains a particular irrep ##\rho_0## through the character inner product $$ \langle \chi, \chi_0\rangle = \frac{1}{|G|} \sum_{g\in G} \chi(g) \chi_0(g)^*$$ where ##\chi## and ##\chi_0## are the characters of ##\rho## and ##\rho_0##, respectively. Since all group elements in the same conjugacy class have the same characters, this may be...
Back
Top