Hessian of least squares estimate behaving strangely

In summary, the conversation is about a nonlinear least squares estimation on a function of 14 variables using the quasi-Newton algorithm in MATLAB. The point estimates seem reasonable, but the Hessian matrix is not. Upon further investigation, it is found that every value in row 5 and column 5 of the Hessian matrix is zero, except for the entry at (5,5), which is 1. The function is not badly scaled and the eigenvalues of the Hessian are mostly zero except for one that is 170,000. Suggestions are made to try different methods for calculating the eigenvalues and inverting the Hessian, and to use a more robust algorithm like Levenberg-Marquardt for ill-conditioned problems.
  • #1
Jeffack
14
0
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ). I do this using the quasi-Newton algorithm in MATLAB. This also gives the Hessian (matrix of second derivatives) at the minimizing point. My point estimates all seem reasonable, but the Hessian does not:

Every value in row ##5## and in column ##5## is zero, except for the entry at (##5, 5##), which is 1. Several of the other entries are also zero.

To find standard errors, you invert the Hessian and take the square root of the diagonals. When I do this, all of the estimates are near 1, and 4 of them are exactly 1.

I went and looked back at the function, and I couldn't see anything blatantly wrong. When I change the value of the 5th parameter, the value of the function changes (as it should); that is pretty much the end of my troubleshooting ability. I don't think the function is badly scaled; all of the parameters are between -2 and 4.

The last thing I should mention is that I had MATLAB calculate the eigenvalues of the Hessian. The first 13 of them were approximately zero, and the last one was 170,000.

Any idea what's going on here? I've calculated the Hessians for very similar functions and not had this issue.
 
Physics news on Phys.org
  • #2
Jeffack said:
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ).

Did you mean that you minimize ##\Sigma_i(y_i - \hat y_i)^2## ?

I don't understand whether "##y##" and "##x##" denote vectors or scalars.

If you have a scalar ##y## that is a function of 14 scalar variables ##x_1, x_2, ...x_{14} ## then are you minimizing ##\Sigma_j (y_j - f(x_{1,j}, x_{2,j}, x_{3,j},...x_{14,j} ))^2 ## where ##j## is the index for the ##j##th sample ?
 
  • #3
Sorry about that. You are correct: ##y## is a n-by-1 vector, where n is the number of observations. ##x_1, x_2,...x_{14} ## are all n-by-1 vectors. ##y_i## is the ##i##th element of vector ##y##. ##x_{n,i}## is the ##i##th element of vector ##x_n##. The problem should have been written as you wrote it:

##\Sigma_j (y_j - f(x_{1,j}, x_{2,j}, x_{3,j},...x_{14,j} ))^2##
 
  • #4
Without knowing some specifics, I can only suggest that you see what happens if you change some of the data you are using.

What is the function you are fitting to the data ?

Perhaps a MATLAB user can tell you what's going on if you show some of the MATLAB code.
 
  • #5
Jeffack said:
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ). I do this using the quasi-Newton algorithm in MATLAB. This also gives the Hessian (matrix of second derivatives) at the minimizing point. My point estimates all seem reasonable, but the Hessian does not:

Every value in row ##5## and in column ##5## is zero, except for the entry at (##5, 5##), which is 1. Several of the other entries are also zero.

To find standard errors, you invert the Hessian and take the square root of the diagonals. When I do this, all of the estimates are near 1, and 4 of them are exactly 1.

I went and looked back at the function, and I couldn't see anything blatantly wrong. When I change the value of the 5th parameter, the value of the function changes (as it should); that is pretty much the end of my troubleshooting ability. I don't think the function is badly scaled; all of the parameters are between -2 and 4.

The last thing I should mention is that I had MATLAB calculate the eigenvalues of the Hessian. The first 13 of them were approximately zero, and the last one was 170,000.

Any idea what's going on here? I've calculated the Hessians for very similar functions and not had this issue.
I believe Levenberg-Marquardt is more robust than quasi-Newton methods for computational nonlinear regression, and generally works better for ill-conditioned problems. Also, have you tried different methods for calculating the eigenvalues? Or even, for inverting your Hessian matrix? That 170,000 sounds like a very poor approximation of the inverse of the Hessian matrix, which could be caused by an ill-conditioned starting function, e.g. say your initial function is flat, then the calculations of the derivatives might be flawed and thus, further iterations done by the approximation of the inverse or even the eigenvalues might be flawed. You have to make sure you meet the (usually strict under experimental use) conditions variable metric algorithms require.

I would be curious to see the MATLAB code, however that's a problem I would definitely suggest solving in R!
 

1. Why is the Hessian matrix important in least squares estimation?

The Hessian matrix is important in least squares estimation because it provides information about the curvature of the cost function. This allows us to determine the direction of steepest descent and make more efficient updates to the parameters in the model.

2. How does the Hessian matrix affect the performance of the least squares estimate?

The Hessian matrix can affect the performance of the least squares estimate in several ways. If the matrix is well-conditioned, it can lead to faster convergence and more accurate estimates. However, if the matrix is poorly conditioned or singular, it can lead to slow convergence and unstable estimates.

3. What are some signs that the Hessian of the least squares estimate is behaving strangely?

Some signs that the Hessian of the least squares estimate is behaving strangely include slow convergence, large variations in estimated parameters, and high condition number of the matrix. These issues can also lead to overfitting or underfitting of the model.

4. How can we address issues with the Hessian in least squares estimation?

One way to address issues with the Hessian in least squares estimation is to use regularization techniques, such as ridge regression, to improve the conditioning of the matrix. Another approach is to use alternative optimization algorithms that are less sensitive to the Hessian matrix, such as stochastic gradient descent.

5. Can the Hessian of the least squares estimate be ignored?

No, the Hessian of the least squares estimate should not be ignored. It provides important information about the curvature of the cost function and can greatly impact the performance of the model. Ignoring the Hessian can lead to inaccurate estimates and poor performance of the model.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
486
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
473
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
904
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
Back
Top