On uniform boundedness of the GD algorithm

  • Context: Graduate 
  • Thread starter Thread starter Vulture1991
  • Start date Start date
  • Tags Tags
    Algorithm Uniform
Click For Summary
SUMMARY

The discussion focuses on the uniform boundedness of the sequences generated by the Gradient Descent (GD) algorithm for functions with Lipschitz continuous gradients. Specifically, it examines the convergence of sequences ##\{x^k\}_k## and ##\{\tilde{x}^k\}_k## derived from the functions ##f## and ##\tilde{f}(x)=f(x)+x'Ax##, respectively. The key question posed is whether the distance between these two sequences remains uniformly bounded, independent of the matrix ##A## and step-size ##\alpha##. The challenge lies in proving the existence of a compact sublevel set that is independent of both variables.

PREREQUISITES
  • Understanding of Gradient Descent algorithm and its convergence properties.
  • Familiarity with Lipschitz continuous functions and their gradients.
  • Knowledge of compact sets in mathematical analysis.
  • Basic linear algebra, particularly concerning positive semi-definite matrices.
NEXT STEPS
  • Research the properties of Lipschitz continuous functions and their implications for optimization.
  • Explore the concept of compact sublevel sets in the context of optimization problems.
  • Study the effects of different step sizes on the convergence of Gradient Descent.
  • Investigate references on uniform boundedness principles in optimization algorithms.
USEFUL FOR

Mathematicians, optimization researchers, and data scientists interested in the theoretical foundations of Gradient Descent and its convergence behavior in optimization problems.

Vulture1991
Messages
4
Reaction score
0
Consider a function ##f\in\mathcal{C}^2## with Lipschitz continuous gradient (with constant ##L##)- we also assume the function is lowerbounded and has at least one minimum. Let ##\{x^k\}_k## be the sequence generated by Gradient Descent algorithm with initial point $x^0$ and step-size ##0<\alpha<2/L##:
\begin{equation}
x^{k+1}=x^k-\alpha\nabla f (x^k).
\end{equation}
We know that the sequence will converge to a critical point.

Now consider the new function ##\tilde{f}(x)=f(x)+x'Ax## with some ##A\succeq\mathbf{0}##. Let ##\{\tilde{x}^k\}_k## be the sequence generated by Gradient Descent algorithm with **the same** initial point ##\tilde{x}^0=x^0## and step-size ##0<\alpha<2/L##:
\begin{equation}
\tilde{x}^{k+1}=\tilde{x}^k-\alpha\nabla \tilde{f} (\tilde{x}^k).
\end{equation}

**Can we prove that ##\mathrm{dist}\left(\{\tilde{x}^k\}_k,\{{x}^k\}_k\right)## is uniformly bounded, independent of $A$ and step-size $\alpha$?**

I tried to prove it by assuming existence of a compact sublevel set ##\mathcal{L}=\{x:f(x)\leq f(x^0)\}## and using the fact that Gradient Descent generates a decreasing sequence of objective values (implying that the sequence remains in the compact sublevel set). However I cannot prove existence of a set independent of both $A$ and $\alpha$.
 
Physics news on Phys.org
Do you have a reference where people can look up the algorithm in order to understand your question?
 
What is ##A## here, a matrix?
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 27 ·
Replies
27
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
92
Views
5K