What Exactly is Step Size in Gradient Descent Method?

Click For Summary

Discussion Overview

The discussion revolves around the concept of step size, denoted as ## \alpha ##, in the gradient descent method, a numerical optimization technique used to find local or global minima of functions. Participants explore its definition, implications, and practical considerations in the context of optimization algorithms, particularly in machine learning.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant defines gradient descent and presents the formula, questioning the precise definition of step size ## \alpha ## and its dimensional analysis, which they find unclear.
  • Another participant emphasizes the importance of choosing the step size wisely, noting that a step size that is too small can prolong the algorithm's runtime, while one that is too large may cause the algorithm to overshoot the minimum.
  • A later reply provides an interpretation of ## \alpha ## in vector terms, suggesting that it determines the distance moved in the direction of the gradient vector and discusses how its value can be adjusted during optimization based on the gradient's magnitude.
  • One participant references a tutorial paper that offers an intuitive explanation of steepest descent and conjugate gradient methods, suggesting it may provide additional insights into the topic.

Areas of Agreement / Disagreement

Participants express varying interpretations of the step size and its implications, with no consensus reached on a singular definition or approach to its application in gradient descent.

Contextual Notes

There are unresolved aspects regarding the interpretation of dimensional analysis for step size and the conditions under which its value should be adjusted during optimization.

Dario56
Messages
289
Reaction score
48
Gradient descent is numerical optimization method for finding local/global minimum of function. It is given by following formula: $$ x_{n+1} = x_n - \alpha \nabla f(x_n) $$ There is countless content on internet about this method use in machine learning. However, there is one thing I don't understand and which I couldn't find even though it is basic.

What exactly is step size ## \alpha ## ?

Wikipedia states that it is tunning parameter in optimization algorithm which I understand, but not enough is being said about it to be considered a definition. Dimension analysis states that its dimensions should be ## \frac {(\Delta x )^2} {\Delta y} ## which I am not sure how to interpret.
 
Last edited:
Physics news on Phys.org
Because it’s a tuning parameter, you must choose one wisely. Too small and your algorithm will run too long, too large and you will miss the valley or hill you’re trying to locate.

Here’s some discussion on it:

https://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

The blogger says it may be obsolete theory but I think it may still apply to what you’re asking about.
 
  • Like
Likes   Reactions: Dario56
jedishrfu said:
Because it’s a tuning parameter, you must choose one wisely. Too small and your algorithm will run too long, too large and you will miss the valley or hill you’re trying to locate.

Here’s some discussion on it:

https://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

The blogger says it may be obsolete theory but I think it may still apply to what you’re asking about.
Thank you. I've got this in the meantime.

I had problem with understanding this parameter because I didn't look at the equation of gradient descent in vector form and it should be seen in this light since gradient is a vector valued function.

Parameter ##\alpha## basically defines how long in the direction of the gradient vector we want to go. If parameter has value for example 0,5, it means we move in the opposite direction (opposite because we subtract it from position vector ##x_n##) of the gradient vector by length equal to 0,5 value of gradient vector at the point ##x_n##.

Its value can be changed during optimization. If it is too big we can miss the minimum and if it is too small it can get too many iterations to converge.

I would say if value of gradient is big step size can be bigger and if gradient value is small that means we are close and so we need to make step size smaller not to miss the minimum we are close to.
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 15 ·
Replies
15
Views
14K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
11
Views
7K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K