Minimization of objective function

Click For Summary
SUMMARY

The discussion focuses on minimizing the objective function \(\tilde{J}_x\) defined as \(\mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)^2] + \nu \mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)tr(\nabla_x^2\hat{y}(x))] + \nu \mathbb{E}_{p(x,y)}[||\nabla_x\hat{y}(x)||^2]\), where \(x\) is a vector and \(y\) is a scalar. The user inquires about the necessity of learning the Calculus of Variations for solving this problem. Responses indicate that while understanding the underlying mathematics can be beneficial, it is not strictly necessary as machine learning tools can automate the minimization process. Additionally, the discussion highlights that the function may not always achieve a minimum or maximum value, particularly in open sets.

PREREQUISITES
  • Understanding of objective functions in machine learning
  • Familiarity with expectation notation \(\mathbb{E}_{p(x,y)}\)
  • Knowledge of gradient and Hessian matrices, specifically \(\nabla_x\hat{y}(x)\) and \(\nabla_x^2\hat{y}(x)\)
  • Basic concepts of regularization in deep learning
NEXT STEPS
  • Study the principles of Lagrange multipliers in optimization
  • Learn about the Calculus of Variations and its applications in machine learning
  • Explore the implementation of optimization algorithms in Python using libraries like TensorFlow or PyTorch
  • Research the implications of open sets on optimization problems in mathematical analysis
USEFUL FOR

Machine learning practitioners, data scientists, and researchers interested in optimization techniques for deep learning models.

kiuhnm
Messages
66
Reaction score
1
Hi,
I need to minimize, with respect to [itex]\hat{y}(x)[/itex], the following function:
[tex]\tilde{J}_x = \mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)^2] + \nu \mathbb{E}_{p(x,y)}[(\hat{y}(x)-y)tr(\nabla_x^2\hat{y}(x))] + \nu \mathbb{E}_{p(x,y)}[||\nabla_x\hat{y}(x)||^2],[/tex]
where [itex]x[/itex] is a vector and [itex]y[/itex] a scalar.
I found this in a book about Deep Learning (Machine Learning). I'm studying on my own and this math is a bit over my head. If you want more context, see pages 215-216 here: http://goodfeli.github.io/dlbook/contents/regularization.html
First of all, do I need to learn the Calculus of Variations to solve this?
The expression I wrote here is slightly different from the one on the book, because I think the authors forgot a "trace" (tr).
Thank you for your time.
 
Last edited by a moderator:
Physics news on Phys.org
Hey,
I think you're looking for the Lagrange multiplier

MP

EDIT: Sorry, misread the post, I thought you already wanted a solution within calculus. The answer to whether or not you'll have to learn it is not really, as you will have a machine do it for you anyway. So you don't have to understand why this solution works as long as you manage to code it once / get somebody else to do it.
 
And please don't forget that the function doesn't neccesarily have to obtain a min/max value, unless you're working with a closed set.

For example the function f(x) = x obtains no minima/maxima for x∈(0,1), although you can get "infinitely close" to both infimum and supremum (0 and 1). You may have to consider these cases separately, that really depends on what you're doing.

MP
 

Similar threads

Replies
5
Views
2K
Replies
8
Views
2K
Replies
4
Views
2K
  • · Replies 19 ·
Replies
19
Views
4K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
10
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K