# Functionals and calculus of variations

I have been studying calculus of variations and have been somewhat struggling to conceptualise why it is that we have functionals of the form $$I[y]= \int_{a}^{b} F\left(x,y,y' \right) dx$$ in particular, why the integrand $F\left(x,y,y' \right)$ is a function of both $y$ and it's derivative $y'$?

My thoughts on the matter are that as the functional $I$ is itself dependent on the entire function $y(x)$ over the interval $x\in [a,b]$, then if $I$ is expressed in terms of an integral over this interval then the 'size' of the integral will depend on how $y$ varies over this interval (i.e. it's rate of change $y'$ over the interval $x\in [a,b]$) and hence the integrand will depend on $y$ and it's derivative $y'$ (and, in general, higher order derivatives in $y$. I'm not sure if this is a correct understanding and I'm hoping that someone can enlighten me on the subject (particularly if I'm wrong). Thanks.

Think of the motion of a car. The independent variable is time t, but to describe its path you have to give its (initial) position _and_ its (initial) velocity, the derivative of position.

Can one imply from this then, that as we initially need to specify the position and the velocity on order to describe the configuration of a physical system, then any function $F$ characterising the dynamics of the system over a given interval must be a function of both position and velocity. (In doing so, we can describe the dynamics of the system at any point in the interval that we are considering by specifying the position and velocity at that point and plugging these values into $F$)?!

I'm trying to get an understanding for it in the abstract sense as well, without relating to any particular physical problem as to why the integrand would be a function of some function and it's derivatives (first order and possibly higher order)?

Can one imply from this then, that as we initially need to specify the position and the velocity on order to describe the configuration of a physical system, then any function $F$ characterising the dynamics of the system over a given interval must be a function of both position and velocity. (In doing so, we can describe the dynamics of the system at any point in the interval that we are considering by specifying the position and velocity at that point and plugging these values into $F$)?!

I'm trying to get an understanding for it in the abstract sense as well, without relating to any particular physical problem as to why the integrand would be a function of some function and it's derivatives (first order and possibly higher order)?
My understanding is that the calculus of variations uses notation that treats the y and y' as independent variables, even though they aren't actually independent (as you point out). However the theory still works.

Yeah, I guess I'm really trying to understand why the integrand is treated as a function of $y$ and $y'$, why not just $y$? What's the justification/mathematical (and/or) physical reasoning behind it?

Is it just that if you wish to be able to describe the configuration of a physical system and how that configuration evolves in time you need to specify the positions of the components of the system and also how those positions change in time (i.e. the derivatives of the positions). Hence, as we wish the Lagrangian of the system to characterise its dynamics, this implies that the Lagrangian should be a function of both position and velocity?!

Last edited:
Yeah, I guess I'm really trying to understand why the integrand is treated as a function of $y$ and $y'$, why not just $y$? What's the justification/mathematical (and/or) physical reasoning behind it?

Is it just that if you wish to be able to describe the configuration of a physical system and how that configuration evolves in time you need to specify the positions of the components of the system and also how those positions change in time (i.e. the derivatives of the positions). Hence, as we wish the Lagrangian of the system to characterise its dynamics, this implies that the Lagrangian should be a function of both position and velocity?!
The function F depends on the problem you are trying to solve. So for example, if you are trying to minimize the energy of a system, the energy consists of kinetic energy (depends on y') and potential energy (depends on y). However if you are trying to minimize the length of a curve, the integrand is ds=sqrt(1+y'^2) which does not depend on y. So the form of f depends on the problem at hand. Does that help?

Thanks. I understand in those cases, but would what I said be correct in the more general case of applying the principle of stationary action to a physical system, i.e. one wishes to describe the state of some system at time $t_{0}$ and what state it evolves to at a later (fixed) time $t_{1}$. To do so one must specify the coordinates $q_{i}$ of the components of the system and also how those coordinates change in time, i.e. their derivatives, $\dot{q}_{i}$ in the time interval $t\in [t_{0}, t_{1}]$. Thus, we require a function of the form $$\mathcal{L}= \mathcal{L}\left(q_{i}(t),\dot{q}_{i}(t)\right)$$ to completely specify the state of the system at any time $t\in [t_{0}, t_{1}]$. From this, we define a functional, the action such that $$S\left[q_{i}(t)\right] = \int_{t_{0}}^{t_{1}} \mathcal{L}\left(q_{i}(t),\dot{q}_{i}(t)\right) dt$$ that associates a number to each path $\vec{q}(t)=\left(q_{i}(t)\right)$ between the (fixed) states $\vec{q}\left(t_{0}\right)$ and $\vec{q}\left(t_{1}\right)$. We then invoke the principle of stationary action to assert that the actual physical path taken between these two points is the one which satisfies $\delta S = 0$. Would this be a correct interpretation?

That is a good description and matches how I understand calculus of variations in the context of general physical systems.

Great. Thanks very much for your help!

As a follow-up. Would it be fair to say then, that as $S\left[\vec{q}(t)\right]$ contains information about all possible paths between the points $\vec{q}\left(t_{0}\right)$ and $\vec{q}\left(t_{1}\right)$, this implies that the integrand will be a function of the values of those paths and their derivatives at each point $t\in [t_{0}, t_{1}]$. Now, as at each point along the path in the interval $t\in [t_{0}, t_{1}]$, once we have specified the position we are free to specify how that position changes (i.e. the velocity) at that point independently, as we are considering all possible paths. However, upon imposing the principle of stationary action, we are choosing a particular path, i.e. the one which extremises the action. This re-introduces the explicit dependence of $\dot{q}_{i}(t)$ on $q_{i}(t)$ via the relation $$\delta\dot{q}_{i}(t) = \frac{d}{dt}\left(\delta q_{i}(t)\right)$$ Apologies to re-iterate, just trying to fully firm up the concept in my mind.

Sorry, please ignore the post above - I realised the error in what I was writing after posting it and the forum won't let me delete it now!

Instead of the above post, is the following a correct summary (pertaining to the Lagrangian and why it is dependent on position and velocity):

The state of a mechanical system at a given time, $t_{0}$ is completely specified by the positions of the particles, along with their corresponding velocities, within it. Thus, if we wish to describe the state of this system at some later time $t$ in some fixed time interval, then we need to specify how the system evolves over this interval, i.e. we require a function which depends on the in positions of the particles and also the rate at which those positions are changing (i.e. their velocities) at each point within the time interval (a requirement if we wish to consider external forces acting on the particles). This motivates us to consider a function $\mathcal{L}= \mathcal{L}\left(q_{i}(t), \dot{q}_{i}(t)\right)$ which completely specifies the state of a mechanical system at each point $t \in [t_{0},t_{1}]$.

My intuition is that the Lagrangian is sort of a cost function. You might not care about y' in some problems. So, you could imagine a problem in which your cost per unit time to travel from point A to point B in a fixed amount of time is strictly a function of position. You would then try to spend as much of your time in the areas of lower cost to minimize your travel expenses. But let's say you want to discourage speeding as well, so you want to penalize higher velocities. My intuition is that it's easier to apply that speeding penalty if you make the Lagrangian also a function of velocity. For example, you could add the speed squared or cubed or whatever you want. So, it's natural to want to introduce y' as a variable to be able to put that into your cost function. It's a pretty flexible construction, so you can imagine that we can just try to penalize any path that doesn't follow the laws of physics that we want, and hopefully, that will give you a description of physics. When you work out the details, it does turn out to work.

Stephen Tashi
Yeah, I guess I'm really trying to understand why the integrand is treated as a function of $y$ and $y'$, why not just $y$? What's the justification/mathematical (and/or) physical reasoning behind it?
I haven't heard a mathematical answer to that question yet. Let me reiterate the question emphasizing the mathematical aspect.

When we have a function such as $y = 3x + x^2$ we denote it as $y = f(x)$, not as $y = f(x,3x,x^2)$ even though evaluation $f$ involves the intermediate steps of evaluating $3x$ and $x^2$. So why is an expression like $F(x,y,y')$ necessary in discussing the integrand in the calculus of variations? Isn't computing $y'$ from $y$ an intermediate step in the process? If we are given $y$ we can find $y', y'',...$ etc. Why not just write the integrand as $F(x,y)$ or even $F(x)$? After all, the integration $\int_a^b F(...) dx$ is ordinary integration. The integrand must be a function of $x$.

My conjecture for an explanation:

In the expression $I(y) = \int_a^b F(x,y,y') dx$ we see that it's $I(y)$ instead of $I(y,y')$ so the fact that finding $y'$ is needed as an intermediate step isn't recognized in the left hand side.

If we have function like $z = x + 3x + x^2$ we can choose to describe it in a way that exhibits intermediate calculations. For example, let $y = 3x + x^2$ and $F(a,b) = a + b$. Then we can write $z = F(x,y)$.

By analogy the notation $F(x,y,y')$ indicates a particular choice of representing the integrand that takes pains to exhibit intermediate calculations. It's not a simple algebraic expression. The computation implied by $F(x,y,y')$ is an algorithm. As far as I can see, there is nothing incorrect about notation like $I(y) = \int_a^b G(x,y) dx$ to describe the same functional. It's just that the processes described by $F$ and $G$ would be technically different. Thinking of $F$ and $G$ as computer routines, the routine $F$ requires that you compute $y'$ and then give it as input to $F$. The routine $G$ does not.

So I think the notation $F(x,y,y')$ is not a necessary notation. It is a permissible notation that may be helpful if it reminds us of the steps involved in forming the integrand.

Thanks for your help on the matter.
Would it be fair to say the following:

The configuration of a system at a given instant in time is completely determined by specifying the coordinates of each of the particles within the system at that instant. However, using just this information one cannot determine the configuration of the system at subsequent instants in time. To do so requires knowledge of the rate of change of these positions at the instant considered. For given values of the coordinates the system can have any velocities (as we are considering the coordinates and velocities of the particles at the same instant in time), and this will affect the configuration of the system after an infinitesimal time interval, $dt$ . Thus, by simultaneously specifying the coordinates and velocities of the particles at a given instant in time, we can, in principle, calculate it's subsequent time evolution. This means that, if the coordinates and velocities of the particles are specified at a given instant, $t_{0}$, then the accelerations of those particles are uniquely defined at that instant, enabling one to construct equations of motion for the system. Following the principle of stationary action, we are motivated to consider a function which summarises the dynamics of a physical system at each given instant in time (over some finite time interval), along all possible paths that the the system could take between two fixed configurations, $\vec{q} (t_{0})$ and $\vec{q} (t_{1})$. As such, taking into account the discussion above, we can imply that for this function to successfully summarise the dynamics of the system at each point, it is sufficient for it to be a function of the coordinates $q_{i}$ and the velocities $\dot{q}_{i} (t)$ of the constituent components of the system, i.e. a function of the form $\mathcal{L} =\mathcal{L} (q_{i} (t), \dot{q}_{i} (t)$ (we need not consider higher order derivatives as it is known that the dynamical state of the system, at a give instant in time, is completely specified by the values of its coordinates and velocities at that instant). Given this, we can then attribute a value to the dynamics of the system, depending on the path, $\vec{q} (t)= (q_{1},\ldots ,q_{n})$, that it takes between the two fixed configurations, $\vec{q} (t_{0})$ and $\vec{q} (t_{1})$. We do so by defining a functional, the action, as follows $$S[\vec{q} (t)] = \int_{t_{0}}^{t_{1}}\mathcal{L} (q_{i} (t), \dot{q}_{i} (t)) dt$$ The principal of stationary action then asserts that the actual path taken by the system between these two fixed configurations is the one for which the action is extremised (i.e. the path which gives an extremal value to this integral).

Last edited:
Stephen Tashi
Thanks for your help on the matter.
Would it be fair to say the following:
The question I have about thr physics that followed is what does it say about the mathematical notation like $F(x,y,y')$ or $G(x,y)$ when $y$ is a function of x? Thinking of $F$ and $G$ as being implemented by computer algorithms, what does the argument $y$ represent?

One possibility is that $y$ represents a function. In many computer languages an argument can be a function instead of a single number. If we give an algorithm the ability to access the function $y(x)$ then it can in principle compute $y', y'', y'''$. This is the convention that applies to the notation $I(y)$. In that notation, $y$ represents a function.

Another possiblity is that $y$ represents a single numerical value. In that case, notation like $G(x,y)$ does not represent giving $G$ the knowledge of the function $y(x)$. So we cannot assume that the algorithm $G$ can compute $y'(x)$.

Under the convention that arguments are single numerical values then I don't see how the algorithm $F(x,y,y')$ can reconstruct any information about $y''$ (acceleration) from pure mathematics. To do that, it would have to know the behavior of $y'$ in an interval. Are you saying we have a physical situation where the knowledge of position and velocity at one point in time is sufficient to compute the subsequent behavior of the system (and hence compute any derivative of that behavior that is desired)?

( There is another recent thread where someone remarks that physicists often use ambiguous notation that makes it difficult to distinguish between a function and single numerical value that comes from evaluating that function.)

I was following the Landau-lifschitz book on classical mechanics to be honest, where they describe it in a similar manner. I think what is perhaps meant is that using this information as initial conditions for an equation of motion one can uniquely determine the acceleration at that initial instant?!
My thoughts were that for each possible path between to points, the lagrangian is a function of the coordinates and velocities of this path, such that, at each instant in time along the time interval the lagrangian characterises the dynamics of the system if it were to follow that path (i.e. by plugging in the values of the coordinates and velocities at each instant in time along the path into the lagrangian we can characterise the dynamics of the system along that path).

Fredrik
Staff Emeritus
Gold Member
The notation ##F(x,y,y')## is pretty bad in my opinion. It should be ##F(x,y(x),y'(x))##. ##y## is a function. ##y(x)## is an element of the codomain of ##y##, so it's typically a number. ##F## doesn't take functions as input. It takes three real numbers.

Similarly, I would never write ##S[\vec q(t)]##, because ##\vec q(t)## is an element of ##\mathbb R^3##, not a function. (It's a "function of t" in the sense that its value is determined by the value of t, but it's still not a function). I would write
$$S[\vec q]=\int_a^b L(\vec q(t),\vec q'(t),t)\mathrm dt.$$ (When I do calculations with a pen and paper, I will of course abuse the notation to avoid having to write everything out). ##L## is usually something very simple. In the classical theory of a single particle moving in 1 dimension, as influenced by a potential ##V:\mathbb R\to\mathbb R##, it can be defined by ##L(r,s,u)=\frac{1}{2}ms^2-V(r)## for all ##r,s,u\in\mathbb R##. Note that this ensures that ##L(q(t),q'(t),t)=\frac{1}{2}mq'(t)^2-V(q(t))## for all t.

is pretty bad in my opinion. It should be F(x,y(x),y′(x))F(x,y(x),y'(x)). yy is a function. y(x)y(x) is an element of the codomain of yy, so it's typically a number. FF doesn't take functions as input. It takes three real numbers.
Exactly. That's what I was trying to allude to in my description. The Lagrangian is a function of the values of the coordinates and velocities of the particle at each given instant over the time interval considered.

Would what I said in the post (above yours) about why the Lagrangian is a function of coordinates and velocities, in a more general sense, be correct? (I know that for conservative systems it assumes the form $\mathcal{L}=T-V$, but I was trying to justify to myself the reasoning as to why we consider the Lagrangian to be a function of position and velocity in the first place, before considering any particular cases, in which the components, such as $T$ and $V$, are clearly functions of the coordinates and velocities?)

Also, is what I said about the action (in previous post), i.e. as a means of attributing a value to the characteristic dynamics of a system due to it following a particular path, $\vec{q}$, enabling us to distinguish the actual physical path taken by the system (using variational techniques), correct?

in an interval. Are you saying we have a physical situation where the knowledge of position and velocity at one point in time is sufficient to compute the subsequent behavior of the system (and hence compute any derivative of that behavior that is desired)?
In reference to this part I was following Landau-Lifschitz:

"If all the coordinates and velocities are simultaneously specified, it is known from experience that the state of the system is completely determined and it's subsequent motion can, in principle, be calculated. Mathematically, this means that, if all the coordinates $q$ and velocities $\dot{q}$ are given at some instant, the accelerations $\ddot{q}$ at that instant are uniquely defined."

(Mechanics, L.D. Landau & E.M.Lifschitz)

Fredrik
Staff Emeritus
Gold Member
That sounds like the theorem that says (roughly) that if f is a nice enough function, then the differential equation ##\vec x''(t)=f(\vec x(t),\vec x'(t),t)## has a unique solution for each initial condition ##\vec x(t_0)=\vec x_0##, ##\vec x'(t_0)=\vec v_0##.

Lagrangian mechanics is based on a slightly different theorem (I don't recall actually seeing such a theorem, but I'm fairly sure that one exists): A unique solution for each boundary condition ##\vec x(t_a)=x_a##, ##\vec x(t_b)=\vec x_b##.

That sounds like the theorem that says (roughly) that if f is a nice enough function, then the differential equation x⃗ ′′(t)=f(x⃗ (t),x⃗ ′(t),t)\vec x''(t)=f(\vec x(t),\vec x'(t),t) has a unique solution for each initial condition x⃗ (t0)=x⃗ 0\vec x(t_0)=\vec x_0, x⃗ ′(t0)=v⃗ 0\vec x'(t_0)=\vec v_0.
Does that explain what Landau means in the section that I quoted?
(The text that I quoted was from their section leading onto formulating lagrangian mechanics).

Lagrangian mechanics is based on a slightly different theorem (I don't recall actually seeing such a theorem, but I'm fairly sure that one exists): A unique solution for each boundary condition ##\vec x(t_a)=x_a##, ##\vec x(t_b)=\vec x_b##.
Yes, good point. Is what I said about the lagrangian correct though?

The lagrangian is a function of the coordinates and velocities of this path, such that, at each instant in time along the time interval the lagrangian characterises the dynamics of the system if it were to follow that path (i.e. by plugging in the values of thecoordinates and velocities at each instant in time along the path into the lagrangian we can characterise the dynamics of the system along that path).
Then, we introduce the action as a means of attributing a value to the characteristic dynamics of a system due to it following a particular path, [itex\vec{q} [/itex]⃗, enabling us to distinguish the actual physical path taken by the system (using variational techniques and specifying the boundary conditions, $\vec{q}(t_{0})$ and $\vec{q}(t_{1})$.

I'm just trying to justify to myself a bit more what the lagrangian is, why it's a function of both coordinates and velocities (I assume that to be able to fully specify the dynamics of the system at each instant in the time interval considered, one needs to know the positions of all the particles and the rate of change of those positions at that point?!), and understand a bit more what the action actually is?!

Fredrik
Staff Emeritus
Gold Member
It's very hard to comment on whether a description in words of something mathematical is "correct". The wordy description isn't going to be as precise. If it was, we wouldn't have needed the mathematical description.

Does that explain what Landau means in the section that I quoted?
(The text that I quoted was from their section leading onto formulating lagrangian mechanics).
I don't know what they meant exactly. I'm puzzled by the fact that they're saying that if you know the positions and the velocities at an instant, then you know the accelerations at that instant. The theorem I mentioned says that if you know the positions and the velocities at an instant, then you know the function that gives you the positions at all times. Then you can use it to determine velocities, accelerations and other things, at all times.

Is what I said about the lagrangian correct though?
As I said, it's very difficult to comment, but I will give it a try.

The lagrangian is a function of the coordinates and velocities of this path,
I wouldn't say that, since you only plug in the positions and velocites at one time. You could say that it's a function of the coordinates and velocities at a point on the path.

There's a much fancier way to say this. The set of all "positions" is a manifold called the system's configuration space. A velocity is a tangent vector at some point in the configuration space, so it's an element of the tangent space at that point. The set of all pairs ##(x,v)## where x is a point in the manifold and v is a tangent vector at x, is called the tangent bundle. The Lagrangian is a function from the tangent bundle (of the system's configuration space) into ##\mathbb R##.

such that, at each instant in time along the time interval the lagrangian characterises the dynamics of the system if it were to follow that path (i.e. by plugging in the values of thecoordinates and velocities at each instant in time along the path into the lagrangian we can characterise the dynamics of the system along that path).
What path? It doesn't take a path as input. I suppose you could, for each t, define a function ##L_t## by ##L_t[q]=L(q(t),q'(t),t)## for all paths q. But what does it mean to characterize the dynamics of the system?

I'm just trying to justify to myself a bit more what the lagrangian is,
The way I see it, Newtonian, Lagrangian and Hamiltonian mechanics are three different approaches to how to add matter and interactions to an otherwise empty spacetime. To define a classical theory of physics, we must specify the matter content of spacetime, and its interactions. If we want to use the theorem about differential equations that says one solution for each initial condition on the positions and velocities, then we define the theory by writing down a force and postulating that the path is found by solving the equation called Newton's 2nd law. If we want to use the other theorem, the one that guarantees one solution for each boundary condition, we define the theory by writing down a Lagrangian and postulating that the path is found by solving the Euler-Lagrange equation. In both of these approaches, the function that defines the theory is essentially just guessed. I don't know if you can describe what it is in a meaningful way.

I suppose that you could say something like this: The action assigns a "badness score" to each path in configuration space. Each path in configuration space defines a path in the tangent bundle. The Lagrangian tells us how different parts of the tangent bundle contribute to the "badness score" of a path through those parts.

why it's a function of both coordinates and velocities (I assume that to be able to fully specify the dynamics of the system at each instant in the time interval considered, one needs to know the positions of all the particles and the rate of change of those positions at that point?!), and understand a bit more what the action actually is?!
That theorem is just as useful if the Lagrangian is independent of one or more of those variables. But there are conserved quantities in that case. (If L is independent of a position coordinate, the corresponding momentum will not change with time). So if we want a theory in which the momenta are changing (e.g. when we give something a push), we can't make L independent of the positions. I think something similar can be said about the velocities, but I haven't really thought about what that would be.

Last edited:

Yeah, sorry I didn't explain myself too well. I meant, like you said, that at each instant, $t$, the lagrangian is a function of the coordinates and velocities evaluated at that instant.
Would it be fair to say that we wish the lagrangian to contain all information about the dynamics of the system, the forces acting on it..., and this implies that for a general physical system, it should be a function of position and velocity (as quantities such as the configuration of the system, and potentials will depend on position, and also quantities such as momentum, etc. will depend on velocity)?!

Sorry to be a pain, I think my main issue is trying to conceptualise in my mind why we consider the lagrangian as a function of velocity as well as position in the first place?! I think I've confused myself, as I thought I understood it when I originally read the chapter I referred to in the Landau-lifshitz book on mechanics!

Fredrik
Staff Emeritus