Why Do Nonlinear Functions Often Lead to Non-Convex Cost Functions?

  • Context: Graduate 
  • Thread starter Thread starter pamparana
  • Start date Start date
  • Tags Tags
    optimisation
Click For Summary
SUMMARY

The discussion centers on the relationship between nonlinear functions and non-convex cost functions in the context of regression analysis. It highlights that while linear regression utilizes a convex cost function, nonlinear models, such as the one defined by the equation \( f(x) = w_0 x^2 + w_1 \exp(x) + \epsilon \), can lead to non-convex cost functions. The cost function \( J(w_0, w_1) = (f(x) - w_0 x^2 + w_1 \exp(x))^2 \) illustrates how the combination of squared terms and exponential terms can introduce non-convexities, particularly when analyzing the Hessian matrix's properties.

PREREQUISITES
  • Understanding of linear regression and convex functions
  • Familiarity with nonlinear functions and their properties
  • Knowledge of cost functions and optimization techniques
  • Basic calculus, specifically partial derivatives and Hessians
NEXT STEPS
  • Explore the properties of non-convex optimization problems
  • Learn about the role of Hessian matrices in determining convexity
  • Study the implications of residuals in regression analysis
  • Investigate advanced regression techniques that handle non-convex cost functions
USEFUL FOR

Data scientists, machine learning practitioners, and statisticians interested in understanding the complexities of nonlinear regression models and their optimization challenges.

pamparana
Messages
123
Reaction score
0
I am taking a course on linear regression online and it talks about the sum of square difference cost function and one of the points it makes is that the cost function is always convex i.e. it has only one optima.

Now, reading a bit more it seems that non-linear functions tend to give rise to non-convex functions and I am trying to develop some intuition behind it. So, suppose I take a random model like:

$$
f(x) = w_0 x^2 + w_1 \exp(x) + \epsilon
$$

And the cost function I choose is the same i.e. I want to minimise:

$$
J(w_0, w_1) = (f(x) - w_0 x^2 + w_1 \exp(x))^2
$$

What is the intuition that the squared term and the exponential term would give rise to non-convexities?
 
Physics news on Phys.org
For the example that you've chosen (fixing a missing minus sign),

$$ \frac{\partial^2 J}{\partial w_0 \partial w_1} = 2 x^2 e^x \geq 0,$$

is positive semidefinite.

In a more general case, we will have residuals

$$ r_i = y_i - \sum_\alpha w_\alpha f_\alpha(x_i)$$

and we are minimising

$$ S = \sum_i r_i^2.$$

The sign of the expressions

$$ m_{\alpha\beta}= \frac{\partial^2 S}{\partial w_\alpha \partial w_\beta} = 2 \sum_i f_\alpha(x_i) f_\beta(x_i),$$

determines the convexivity of ##S## as a function of the ##w_\alpha##. When ##\beta = \alpha##, ##m_{\alpha\alpha}## is a sum of squares, so is positive semidefinite. When ##\beta\neq \alpha##, it is possible to find ##m_{\alpha\beta}<0##, depending on the specific functions ##f_\alpha## and the data ##x_i##.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 56 ·
2
Replies
56
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K