What Exactly is Step Size in Gradient Descent Method?

Dario56 · Feb 15, 2022

Gradient descent is numerical optimization method for finding local/global minimum of function. It is given by following formula: $$ x_{n+1} = x_n - \alpha \nabla f(x_n) $$ There is countless content on internet about this method use in machine learning. However, there is one thing I don't understand and which I couldn't find even though it is basic.

What exactly is step size ## \alpha ## ?

Wikipedia states that it is tunning parameter in optimization algorithm which I understand, but not enough is being said about it to be considered a definition. Dimension analysis states that its dimensions should be ## \frac {(\Delta x )^2} {\Delta y} ## which I am not sure how to interpret.

jedishrfu · Feb 15, 2022

Because it’s a tuning parameter, you must choose one wisely. Too small and your algorithm will run too long, too large and you will miss the valley or hill you’re trying to locate.

Here’s some discussion on it:

https://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

The blogger says it may be obsolete theory but I think it may still apply to what you’re asking about.

Dario56 · Feb 16, 2022

jedishrfu said:

Because it’s a tuning parameter, you must choose one wisely. Too small and your algorithm will run too long, too large and you will miss the valley or hill you’re trying to locate.

Here’s some discussion on it:

https://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

The blogger says it may be obsolete theory but I think it may still apply to what you’re asking about.

Thank you. I've got this in the meantime.

I had problem with understanding this parameter because I didn't look at the equation of gradient descent in vector form and it should be seen in this light since gradient is a vector valued function.

Parameter ##\alpha## basically defines how long in the direction of the gradient vector we want to go. If parameter has value for example 0,5, it means we move in the opposite direction (opposite because we subtract it from position vector ##x_n##) of the gradient vector by length equal to 0,5 value of gradient vector at the point ##x_n##.

Its value can be changed during optimization. If it is too big we can miss the minimum and if it is too small it can get too many iterations to converge.

I would say if value of gradient is big step size can be bigger and if gradient value is small that means we are close and so we need to make step size smaller not to miss the minimum we are close to.

bigfooted · Feb 16, 2022

There is a nice intuitive explanation of SD and CG in this tutorial style paper from Shewchuk, the first 30 pages are about steepest descent,
https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

What Exactly is Step Size in Gradient Descent Method?

Undergrad Why ##a^0=1##?

Undergrad Finding the minimum distance between two curves

High School Arc Length for Hyperbolic Sin

Undergrad Why is ##x=e^y## the inverse of ##y=\int_1^x \frac{1}{t} dt##?

High School Taking a limit and getting the wrong answer...don't know why

Undergrad Correct Upper/Lower limits for continuation of solutions for ODE

Graduate About ellipticity and a proof that a system of PDEs is elliptic

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

What Exactly is Step Size in Gradient Descent Method?

Similar threads