Why Do Nonlinear Functions Often Lead to Non-Convex Cost Functions?

  • Thread starter Thread starter pamparana
  • Start date Start date
  • Tags Tags
    optimisation
AI Thread Summary
Nonlinear functions often lead to non-convex cost functions due to the interaction of their components, such as squared and exponential terms. In the example provided, the cost function involves minimizing a squared difference, which can yield positive semidefinite second derivatives, indicating convexity in certain cases. However, when considering multiple nonlinear terms, the mixed partial derivatives can become negative, leading to non-convexities. This is particularly evident when analyzing residuals in a more general context, where the sign of the second derivatives determines the convexity of the cost function. Understanding these dynamics is crucial for developing intuition around the behavior of nonlinear models in optimization.
pamparana
Messages
123
Reaction score
0
I am taking a course on linear regression online and it talks about the sum of square difference cost function and one of the points it makes is that the cost function is always convex i.e. it has only one optima.

Now, reading a bit more it seems that non-linear functions tend to give rise to non-convex functions and I am trying to develop some intuition behind it. So, suppose I take a random model like:

$$
f(x) = w_0 x^2 + w_1 \exp(x) + \epsilon
$$

And the cost function I choose is the same i.e. I want to minimise:

$$
J(w_0, w_1) = (f(x) - w_0 x^2 + w_1 \exp(x))^2
$$

What is the intuition that the squared term and the exponential term would give rise to non-convexities?
 
Mathematics news on Phys.org
For the example that you've chosen (fixing a missing minus sign),

$$ \frac{\partial^2 J}{\partial w_0 \partial w_1} = 2 x^2 e^x \geq 0,$$

is positive semidefinite.

In a more general case, we will have residuals

$$ r_i = y_i - \sum_\alpha w_\alpha f_\alpha(x_i)$$

and we are minimising

$$ S = \sum_i r_i^2.$$

The sign of the expressions

$$ m_{\alpha\beta}= \frac{\partial^2 S}{\partial w_\alpha \partial w_\beta} = 2 \sum_i f_\alpha(x_i) f_\beta(x_i),$$

determines the convexivity of ##S## as a function of the ##w_\alpha##. When ##\beta = \alpha##, ##m_{\alpha\alpha}## is a sum of squares, so is positive semidefinite. When ##\beta\neq \alpha##, it is possible to find ##m_{\alpha\beta}<0##, depending on the specific functions ##f_\alpha## and the data ##x_i##.
 
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Thread 'Imaginary Pythagoras'
I posted this in the Lame Math thread, but it's got me thinking. Is there any validity to this? Or is it really just a mathematical trick? Naively, I see that i2 + plus 12 does equal zero2. But does this have a meaning? I know one can treat the imaginary number line as just another axis like the reals, but does that mean this does represent a triangle in the complex plane with a hypotenuse of length zero? Ibix offered a rendering of the diagram using what I assume is matrix* notation...
Back
Top