Why Do Nonlinear Functions Often Lead to Non-Convex Cost Functions?

pamparana · Jun 30, 2015

I am taking a course on linear regression online and it talks about the sum of square difference cost function and one of the points it makes is that the cost function is always convex i.e. it has only one optima.

Now, reading a bit more it seems that non-linear functions tend to give rise to non-convex functions and I am trying to develop some intuition behind it. So, suppose I take a random model like:

$$
f(x) = w_0 x^2 + w_1 \exp(x) + \epsilon
$$

And the cost function I choose is the same i.e. I want to minimise:

$$
J(w_0, w_1) = (f(x) - w_0 x^2 + w_1 \exp(x))^2
$$

What is the intuition that the squared term and the exponential term would give rise to non-convexities?

fzero · Jul 1, 2015

For the example that you've chosen (fixing a missing minus sign),

$$ \frac{\partial^2 J}{\partial w_0 \partial w_1} = 2 x^2 e^x \geq 0,$$

is positive semidefinite.

In a more general case, we will have residuals

$$ r_i = y_i - \sum_\alpha w_\alpha f_\alpha(x_i)$$

and we are minimising

$$ S = \sum_i r_i^2.$$

The sign of the expressions

$$ m_{\alpha\beta}= \frac{\partial^2 S}{\partial w_\alpha \partial w_\beta} = 2 \sum_i f_\alpha(x_i) f_\beta(x_i),$$

determines the convexivity of ##S## as a function of the ##w_\alpha##. When ##\beta = \alpha##, ##m_{\alpha\alpha}## is a sum of squares, so is positive semidefinite. When ##\beta\neq \alpha##, it is possible to find ##m_{\alpha\beta}<0##, depending on the specific functions ##f_\alpha## and the data ##x_i##.

Why Do Nonlinear Functions Often Lead to Non-Convex Cost Functions?

Thread 'What Exactly is Dirac’s Delta Function? - Insight'

Thread 'Non-orthogonal bases'

Thread 'Imaginary Pythagoras'

Similar threads

Hot Threads

Insights Fermat's Last Theorem

B What could prove this wrong? I'm having a dispute with friends

B About a definition: What is the number of terms of a polynomial P(x)?

B How Many Straight Lines to Connect an N by M Array of Points in a Closed Loop?

B Geometry Puzzle with 20 points in a cross pattern

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem