Least-square optimization of a complex function

Click For Summary

Discussion Overview

The discussion revolves around the least-square optimization of a complex function, specifically focusing on the cost function defined in terms of a residual function that is complex-valued. Participants explore the implications of using conjugated versus un-conjugated norms in the optimization process and the challenges of calculating gradients in this context.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions how to compute the gradient of the cost function when it includes the conjugate of the residual, noting that the conjugate is not analytic.
  • Another participant clarifies that the cost function can be viewed as a real function of four real parameters, suggesting that the norm can be defined as the sum of the squares of the real and imaginary parts.
  • A different participant shares their experience of successfully applying iterative methods to solve the least-square problem without using the conjugate, proposing that if the residual is analytic, the variables can be treated as real numbers.
  • One participant raises a concern that minimizing the residuals does not necessarily minimize the functional defined without conjugation, providing an example where the functional could be zero while the residuals are not.
  • Another participant expresses uncertainty about how to proceed with gradient calculations when an analytic expression of the function is not available, presenting a potential approach for calculating the gradient and second derivatives.
  • A later reply introduces the idea of defining a gradient operator with respect to complex quantities for scalar-real valued functionals, referencing a specific academic work on the topic.

Areas of Agreement / Disagreement

Participants express differing views on the use of conjugated versus un-conjugated norms in the optimization process, and there is no consensus on the best approach for calculating gradients in the context of complex functions. The discussion remains unresolved regarding the implications of these different methods.

Contextual Notes

Participants highlight the complexity of defining norms and gradients in the context of complex functions, noting the challenges posed by non-analytic components and the need for careful consideration of definitions and assumptions in their approaches.

elgen
Messages
63
Reaction score
5
Dear all,

I have a least square optimization problem stated as below

[tex]\xi(z_1, z_2) = \sum_{i=1}^{M} ||r(z_1, z_2)||^2[/tex]

where [tex]\xi[/tex] denotes the cost function and [tex]r[/tex] denotes the residual and is a complex function of [tex]z_1, z_2[/tex].

My question is around [tex]||\cdot||[/tex]. Many textbooks only deal with real functions and say that this is the Euclidean norm, which is defined as the conjugated inner product of the residual, i.e. [tex]||r||^2 = conj(r)*r[/tex].

My question is that when I apply the gradient descent method to solve this problem, how to calculate [tex]\nabla \xi[/tex]? In particular, as [tex]\xi[/tex] includes [tex]conj(r)[/tex], we cannot take the derivative with respect to [tex]z_1, z_2[/tex] as [tex]conj(r)[/tex] is not an analytic function.

Should I use the un-conjugated inner product for the definition of the norm for this LS optimization with a complex residual function?

Any feedback is welcome. Thank you.


elgen
 
Physics news on Phys.org
Your function is a real function of four real parameters, the real and imaginary parts of z1 and z2. Recall that, if f(z1, z2) = a(z1, z2) + i b(z1, z2), where a and b are real functions, then || f ||^2 = a^2 + b^2. Hope this helps.
 
Took me some time to figure it out. The functional involves four real variable and I applied iterative methods to solve the non-linear least square problem and obtain the correct answer. Your feedback definitely helped. Thx a lot.

To my own curiosity, I defined the functional being simply the product of two complex functions (no conjugation). It becomes

[tex]\xi(z_1,z_2)=\sum_{i=1}^M r_i(z_1, z_2)r_i(z_1,z_2)[/tex].

By treating z_1 and z_2 as two variables (not treat the real and imaginary part separately), I was also able to get the right answer as well.

This leads to my hypothesis, which is, if the residual [/tex]r(z_1,z_2)[/tex] is an analytic function of the complex variables, we could treat these variables just as real numbers and apply the iterative methods.

I am also curious that - is there any difference between these two functionals? When the conjugated functional should be used over the un-conjugated functional and vice versa?

Thx for the feedback again.
 
On a second thought, minimizing the residual [tex]r_i(z_1,z_2)[/tex] is not the same as minimizing the functional

[tex]\xi(z_1,z_2)=\sum_{i=1}^{M}r_i(z_1,z_2) r_i(z_1,z_2)[/tex]

If [tex]r_1=3[/tex] and [tex]r_2=3i[/tex], these residuals are not zero. However, [tex]\xi=0[/tex].

The functional defined using the conjugated product satisfies that it is minimized when each residual are minimized.
 
If the residual is defined as [tex]r_i=f_i^{obs} -f_i(z_1,z_2)[/tex], I am still not sure how to take the gradient method of the cost function if I don't have an analytic expression of [tex]f_i[/tex]. I mean let
[tex]\xi = \Re\{ f_i^{obs} - f_i(z_1,z_2) \}^2 + \Im \{ f_i^{obs} - f_i(z_1,z_2) \}^2[/tex].
Should I proceed as
[tex] \frac{\partial \xi}{\partial z_1} = -2 \Re\{ f_i^{obs}-f_i(z_1,z_2) \} \Re\{ \frac{\partial f_i}{\partial z_1} \}<br /> - 2 \Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial f_i}{\partial z_1} \} [/tex]
[tex] \frac{\partial \xi}{\partial z_2} = -2 \Re\{ f_i^{obs}-f_i(z_1,z_2) \} \Re\{ \frac{\partial f_i}{\partial z_2} \}<br /> - 2 \Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial f_i}{\partial z_2} \} [/tex]
and take the second derivative as
[tex] \frac{\partial^2 \xi}{\partial z_1^2} = 2 \Re\{ \frac{\partial f_i}{\partial z_1} \}^2 - 2\Re\{ f_i^{obs}-f_i(z_1,z_2) \}\Re\{\frac{\partial^2f_i}{\partial z_1^2}\}<br /> + 2 \Im\{ \frac{\partial f_i}{\partial z_1} \}^2 -2\Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial^2f_i}{\partial z_1^2} \} [/tex]
[tex] \frac{\partial^2 \xi}{\partial z_2^2} = 2 \Re\{ \frac{\partial f_i}{\partial z_2} \}^2 - 2\Re\{ f_i^{obs}-f_i(z_1,z_2) \}\Re\{\frac{\partial^2f_i}{\partial z_2^2}\}<br /> + 2 \Im\{ \frac{\partial f_i}{\partial z_2} \}^2 -2\Im\{ f_i^{obs}-f_i(z_1,z_2) \} \Im\{ \frac{\partial^2f_i}{\partial z_2^2} \} [/tex] ?

Thx.
 
The key is to define a gradient operator with respect to complex quantities of a scalar-real valued functional.

author = {Brandwood, D. H.},
title = {A complex gradient operator and its application in adaptive array theory},
journal = {IEE Proceedings H Microwaves, Optics and Antennas},
 

Similar threads

Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
24
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • Poll Poll
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 10 ·
Replies
10
Views
8K