Least-square optimization of a complex function

AI Thread Summary
The discussion centers on a least squares optimization problem involving a complex cost function defined by the residuals of two variables, z_1 and z_2. The main challenge is calculating the gradient of the cost function, particularly due to the presence of the conjugate of the residual, which complicates the differentiation process. Participants suggest that treating the real and imaginary parts of the complex variables separately may simplify the problem, allowing for the application of iterative methods. There is also a debate on when to use conjugated versus un-conjugated products in defining the functional, as minimizing the residual does not always equate to minimizing the functional. The conversation emphasizes the need for a proper gradient operator for complex quantities to effectively tackle the optimization problem.
elgen
Messages
63
Reaction score
5
Dear all,

I have a least square optimization problem stated as below

\xi(z_1, z_2) = \sum_{i=1}^{M} ||r(z_1, z_2)||^2

where \xi denotes the cost function and r denotes the residual and is a complex function of z_1, z_2.

My question is around ||\cdot||. Many textbooks only deal with real functions and say that this is the Euclidean norm, which is defined as the conjugated inner product of the residual, i.e. ||r||^2 = conj(r)*r.

My question is that when I apply the gradient descent method to solve this problem, how to calculate \nabla \xi? In particular, as \xi includes conj(r), we cannot take the derivative with respect to z_1, z_2 as conj(r) is not an analytic function.

Should I use the un-conjugated inner product for the definition of the norm for this LS optimization with a complex residual function?

Any feedback is welcome. Thank you.


elgen
 
Mathematics news on Phys.org
Your function is a real function of four real parameters, the real and imaginary parts of z1 and z2. Recall that, if f(z1, z2) = a(z1, z2) + i b(z1, z2), where a and b are real functions, then || f ||^2 = a^2 + b^2. Hope this helps.
 
Took me some time to figure it out. The functional involves four real variable and I applied iterative methods to solve the non-linear least square problem and obtain the correct answer. Your feedback definitely helped. Thx a lot.

To my own curiosity, I defined the functional being simply the product of two complex functions (no conjugation). It becomes

\xi(z_1,z_2)=\sum_{i=1}^M r_i(z_1, z_2)r_i(z_1,z_2).

By treating z_1 and z_2 as two variables (not treat the real and imaginary part separately), I was also able to get the right answer as well.

This leads to my hypothesis, which is, if the residual [/tex]r(z_1,z_2)[/tex] is an analytic function of the complex variables, we could treat these variables just as real numbers and apply the iterative methods.

I am also curious that - is there any difference between these two functionals? When the conjugated functional should be used over the un-conjugated functional and vice versa?

Thx for the feedback again.
 
On a second thought, minimizing the residual r_i(z_1,z_2) is not the same as minimizing the functional

\xi(z_1,z_2)=\sum_{i=1}^{M}r_i(z_1,z_2) r_i(z_1,z_2)

If r_1=3 and r_2=3i, these residuals are not zero. However, \xi=0.

The functional defined using the conjugated product satisfies that it is minimized when each residual are minimized.
 
If the residual is defined as r_i=f_i^{obs} -f_i(z_1,z_2), I am still not sure how to take the gradient method of the cost function if I don't have an analytic expression of f_i. I mean let
\xi = \Re\{ f_i^{obs} - f_i(z_1,z_2) \}^2 + \Im \{ f_i^{obs} - f_i(z_1,z_2) \}^2<br />.
Should I proceed as
<br /> \frac{\partial \xi}{\partial z_1} = -2 \Re\{ f_i^{obs}-f_i(z_1,z_2) \} \Re\{ \frac{\partial f_i}{\partial z_1} \}<br /> - 2 \Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial f_i}{\partial z_1} \} <br />
<br /> \frac{\partial \xi}{\partial z_2} = -2 \Re\{ f_i^{obs}-f_i(z_1,z_2) \} \Re\{ \frac{\partial f_i}{\partial z_2} \}<br /> - 2 \Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial f_i}{\partial z_2} \} <br />
and take the second derivative as
<br /> \frac{\partial^2 \xi}{\partial z_1^2} = 2 \Re\{ \frac{\partial f_i}{\partial z_1} \}^2 - 2\Re\{ f_i^{obs}-f_i(z_1,z_2) \}\Re\{\frac{\partial^2f_i}{\partial z_1^2}\}<br /> + 2 \Im\{ \frac{\partial f_i}{\partial z_1} \}^2 -2\Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial^2f_i}{\partial z_1^2} \} <br />
<br /> \frac{\partial^2 \xi}{\partial z_2^2} = 2 \Re\{ \frac{\partial f_i}{\partial z_2} \}^2 - 2\Re\{ f_i^{obs}-f_i(z_1,z_2) \}\Re\{\frac{\partial^2f_i}{\partial z_2^2}\}<br /> + 2 \Im\{ \frac{\partial f_i}{\partial z_2} \}^2 -2\Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial^2f_i}{\partial z_2^2} \} <br /> ?

Thx.
 
The key is to define a gradient operator with respect to complex quantities of a scalar-real valued functional.

author = {Brandwood, D. H.},
title = {A complex gradient operator and its application in adaptive array theory},
journal = {IEE Proceedings H Microwaves, Optics and Antennas},
 
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Back
Top