Graduate On uniform boundedness of the GD algorithm

  • Thread starter Thread starter Vulture1991
  • Start date Start date
  • Tags Tags
    Algorithm Uniform
Click For Summary
The discussion centers on the uniform boundedness of the sequences generated by the Gradient Descent (GD) algorithm for a function with a Lipschitz continuous gradient. The original function is modified by adding a quadratic term involving a positive semi-definite matrix A, leading to a new function. The key question is whether the distance between the sequences generated by the GD algorithm for the original and modified functions remains uniformly bounded, regardless of the matrix A and the step-size α. The attempt to prove this involves considering a compact sublevel set but faces challenges in establishing independence from A and α. References for further understanding of the algorithm and its properties are sought by participants in the discussion.
Vulture1991
Messages
4
Reaction score
0
Consider a function ##f\in\mathcal{C}^2## with Lipschitz continuous gradient (with constant ##L##)- we also assume the function is lowerbounded and has at least one minimum. Let ##\{x^k\}_k## be the sequence generated by Gradient Descent algorithm with initial point $x^0$ and step-size ##0<\alpha<2/L##:
\begin{equation}
x^{k+1}=x^k-\alpha\nabla f (x^k).
\end{equation}
We know that the sequence will converge to a critical point.

Now consider the new function ##\tilde{f}(x)=f(x)+x'Ax## with some ##A\succeq\mathbf{0}##. Let ##\{\tilde{x}^k\}_k## be the sequence generated by Gradient Descent algorithm with **the same** initial point ##\tilde{x}^0=x^0## and step-size ##0<\alpha<2/L##:
\begin{equation}
\tilde{x}^{k+1}=\tilde{x}^k-\alpha\nabla \tilde{f} (\tilde{x}^k).
\end{equation}

**Can we prove that ##\mathrm{dist}\left(\{\tilde{x}^k\}_k,\{{x}^k\}_k\right)## is uniformly bounded, independent of $A$ and step-size $\alpha$?**

I tried to prove it by assuming existence of a compact sublevel set ##\mathcal{L}=\{x:f(x)\leq f(x^0)\}## and using the fact that Gradient Descent generates a decreasing sequence of objective values (implying that the sequence remains in the compact sublevel set). However I cannot prove existence of a set independent of both $A$ and $\alpha$.
 
Physics news on Phys.org
Do you have a reference where people can look up the algorithm in order to understand your question?
 
What is ##A## here, a matrix?
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 27 ·
Replies
27
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 16 ·
Replies
16
Views
7K
  • · Replies 175 ·
6
Replies
175
Views
26K