Hessian of least squares estimate behaving strangely

Click For Summary

Discussion Overview

The discussion revolves around issues encountered during a nonlinear least squares estimation involving a function of 14 variables, specifically focusing on the behavior of the Hessian matrix derived from the optimization process. Participants explore the implications of the Hessian's structure and eigenvalues in the context of parameter estimation and standard error calculation.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their use of the quasi-Newton algorithm in MATLAB for nonlinear least squares estimation, noting that the Hessian matrix has unusual characteristics, particularly with many zero entries.
  • Another participant seeks clarification on the notation used, specifically whether the variables are vectors or scalars, and suggests a correction in the formulation of the minimization problem.
  • A later reply confirms the clarification regarding the dimensions of the variables involved in the estimation process.
  • One participant suggests experimenting with different data or providing more details about the function being fitted to understand the Hessian's behavior better.
  • Another participant proposes that the Levenberg-Marquardt method may be more robust than quasi-Newton methods for this type of problem and questions the reliability of the eigenvalue calculations and the Hessian inversion process.
  • Concerns are raised about the potential ill-conditioning of the problem, which may affect the accuracy of the Hessian and its inverse.
  • There is a suggestion to share MATLAB code for further insights, along with a recommendation to consider using R for solving the problem.

Areas of Agreement / Disagreement

Participants express differing views on the methods used for the estimation and the implications of the Hessian's structure. There is no consensus on the best approach to resolve the issues presented.

Contextual Notes

Participants note the potential for ill-conditioning in the optimization problem, which may affect the Hessian's properties and the reliability of the standard error estimates derived from it. Specific assumptions about the function being fitted and the data used remain unspecified.

Jeffack
Messages
14
Reaction score
0
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ). I do this using the quasi-Newton algorithm in MATLAB. This also gives the Hessian (matrix of second derivatives) at the minimizing point. My point estimates all seem reasonable, but the Hessian does not:

Every value in row ##5## and in column ##5## is zero, except for the entry at (##5, 5##), which is 1. Several of the other entries are also zero.

To find standard errors, you invert the Hessian and take the square root of the diagonals. When I do this, all of the estimates are near 1, and 4 of them are exactly 1.

I went and looked back at the function, and I couldn't see anything blatantly wrong. When I change the value of the 5th parameter, the value of the function changes (as it should); that is pretty much the end of my troubleshooting ability. I don't think the function is badly scaled; all of the parameters are between -2 and 4.

The last thing I should mention is that I had MATLAB calculate the eigenvalues of the Hessian. The first 13 of them were approximately zero, and the last one was 170,000.

Any idea what's going on here? I've calculated the Hessians for very similar functions and not had this issue.
 
Physics news on Phys.org
Jeffack said:
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ).

Did you mean that you minimize ##\Sigma_i(y_i - \hat y_i)^2## ?

I don't understand whether "##y##" and "##x##" denote vectors or scalars.

If you have a scalar ##y## that is a function of 14 scalar variables ##x_1, x_2, ...x_{14} ## then are you minimizing ##\Sigma_j (y_j - f(x_{1,j}, x_{2,j}, x_{3,j},...x_{14,j} ))^2 ## where ##j## is the index for the ##j##th sample ?
 
Sorry about that. You are correct: ##y## is a n-by-1 vector, where n is the number of observations. ##x_1, x_2,...x_{14} ## are all n-by-1 vectors. ##y_i## is the ##i##th element of vector ##y##. ##x_{n,i}## is the ##i##th element of vector ##x_n##. The problem should have been written as you wrote it:

##\Sigma_j (y_j - f(x_{1,j}, x_{2,j}, x_{3,j},...x_{14,j} ))^2##
 
Without knowing some specifics, I can only suggest that you see what happens if you change some of the data you are using.

What is the function you are fitting to the data ?

Perhaps a MATLAB user can tell you what's going on if you show some of the MATLAB code.
 
Jeffack said:
I am doing a nonlinear least squares estimation on a function of 14 variables (meaning that, to estimate ##y=f(x)##, I minimize ##\Sigma_i(y_i-(\hat x_i))^2## ). I do this using the quasi-Newton algorithm in MATLAB. This also gives the Hessian (matrix of second derivatives) at the minimizing point. My point estimates all seem reasonable, but the Hessian does not:

Every value in row ##5## and in column ##5## is zero, except for the entry at (##5, 5##), which is 1. Several of the other entries are also zero.

To find standard errors, you invert the Hessian and take the square root of the diagonals. When I do this, all of the estimates are near 1, and 4 of them are exactly 1.

I went and looked back at the function, and I couldn't see anything blatantly wrong. When I change the value of the 5th parameter, the value of the function changes (as it should); that is pretty much the end of my troubleshooting ability. I don't think the function is badly scaled; all of the parameters are between -2 and 4.

The last thing I should mention is that I had MATLAB calculate the eigenvalues of the Hessian. The first 13 of them were approximately zero, and the last one was 170,000.

Any idea what's going on here? I've calculated the Hessians for very similar functions and not had this issue.
I believe Levenberg-Marquardt is more robust than quasi-Newton methods for computational nonlinear regression, and generally works better for ill-conditioned problems. Also, have you tried different methods for calculating the eigenvalues? Or even, for inverting your Hessian matrix? That 170,000 sounds like a very poor approximation of the inverse of the Hessian matrix, which could be caused by an ill-conditioned starting function, e.g. say your initial function is flat, then the calculations of the derivatives might be flawed and thus, further iterations done by the approximation of the inverse or even the eigenvalues might be flawed. You have to make sure you meet the (usually strict under experimental use) conditions variable metric algorithms require.

I would be curious to see the MATLAB code, however that's a problem I would definitely suggest solving in R!
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
3
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K