# Functionals and calculus of variations

Tags:
1. Sep 29, 2014

### "Don't panic!"

I have been studying calculus of variations and have been somewhat struggling to conceptualise why it is that we have functionals of the form $$I[y]= \int_{a}^{b} F\left(x,y,y' \right) dx$$ in particular, why the integrand $F\left(x,y,y' \right)$ is a function of both $y$ and it's derivative $y'$?

My thoughts on the matter are that as the functional $I$ is itself dependent on the entire function $y(x)$ over the interval $x\in [a,b]$, then if $I$ is expressed in terms of an integral over this interval then the 'size' of the integral will depend on how $y$ varies over this interval (i.e. it's rate of change $y'$ over the interval $x\in [a,b]$) and hence the integrand will depend on $y$ and it's derivative $y'$ (and, in general, higher order derivatives in $y$. I'm not sure if this is a correct understanding and I'm hoping that someone can enlighten me on the subject (particularly if I'm wrong). Thanks.

2. Sep 29, 2014

### rdt2

Think of the motion of a car. The independent variable is time t, but to describe its path you have to give its (initial) position _and_ its (initial) velocity, the derivative of position.

3. Sep 29, 2014

### "Don't panic!"

Can one imply from this then, that as we initially need to specify the position and the velocity on order to describe the configuration of a physical system, then any function $F$ characterising the dynamics of the system over a given interval must be a function of both position and velocity. (In doing so, we can describe the dynamics of the system at any point in the interval that we are considering by specifying the position and velocity at that point and plugging these values into $F$)?!

I'm trying to get an understanding for it in the abstract sense as well, without relating to any particular physical problem as to why the integrand would be a function of some function and it's derivatives (first order and possibly higher order)?

4. Oct 1, 2014

### davidmoore63@y

My understanding is that the calculus of variations uses notation that treats the y and y' as independent variables, even though they aren't actually independent (as you point out). However the theory still works.

5. Oct 1, 2014

### "Don't panic!"

Yeah, I guess I'm really trying to understand why the integrand is treated as a function of $y$ and $y'$, why not just $y$? What's the justification/mathematical (and/or) physical reasoning behind it?

Is it just that if you wish to be able to describe the configuration of a physical system and how that configuration evolves in time you need to specify the positions of the components of the system and also how those positions change in time (i.e. the derivatives of the positions). Hence, as we wish the Lagrangian of the system to characterise its dynamics, this implies that the Lagrangian should be a function of both position and velocity?!

Last edited: Oct 1, 2014
6. Oct 1, 2014

### davidmoore63@y

The function F depends on the problem you are trying to solve. So for example, if you are trying to minimize the energy of a system, the energy consists of kinetic energy (depends on y') and potential energy (depends on y). However if you are trying to minimize the length of a curve, the integrand is ds=sqrt(1+y'^2) which does not depend on y. So the form of f depends on the problem at hand. Does that help?

7. Oct 1, 2014

### "Don't panic!"

Thanks. I understand in those cases, but would what I said be correct in the more general case of applying the principle of stationary action to a physical system, i.e. one wishes to describe the state of some system at time $t_{0}$ and what state it evolves to at a later (fixed) time $t_{1}$. To do so one must specify the coordinates $q_{i}$ of the components of the system and also how those coordinates change in time, i.e. their derivatives, $\dot{q}_{i}$ in the time interval $t\in [t_{0}, t_{1}]$. Thus, we require a function of the form $$\mathcal{L}= \mathcal{L}\left(q_{i}(t),\dot{q}_{i}(t)\right)$$ to completely specify the state of the system at any time $t\in [t_{0}, t_{1}]$. From this, we define a functional, the action such that $$S\left[q_{i}(t)\right] = \int_{t_{0}}^{t_{1}} \mathcal{L}\left(q_{i}(t),\dot{q}_{i}(t)\right) dt$$ that associates a number to each path $\vec{q}(t)=\left(q_{i}(t)\right)$ between the (fixed) states $\vec{q}\left(t_{0}\right)$ and $\vec{q}\left(t_{1}\right)$. We then invoke the principle of stationary action to assert that the actual physical path taken between these two points is the one which satisfies $\delta S = 0$. Would this be a correct interpretation?

8. Oct 1, 2014

### davidmoore63@y

That is a good description and matches how I understand calculus of variations in the context of general physical systems.

9. Oct 1, 2014

### "Don't panic!"

Great. Thanks very much for your help!

10. Oct 1, 2014

### "Don't panic!"

As a follow-up. Would it be fair to say then, that as $S\left[\vec{q}(t)\right]$ contains information about all possible paths between the points $\vec{q}\left(t_{0}\right)$ and $\vec{q}\left(t_{1}\right)$, this implies that the integrand will be a function of the values of those paths and their derivatives at each point $t\in [t_{0}, t_{1}]$. Now, as at each point along the path in the interval $t\in [t_{0}, t_{1}]$, once we have specified the position we are free to specify how that position changes (i.e. the velocity) at that point independently, as we are considering all possible paths. However, upon imposing the principle of stationary action, we are choosing a particular path, i.e. the one which extremises the action. This re-introduces the explicit dependence of $\dot{q}_{i}(t)$ on $q_{i}(t)$ via the relation $$\delta\dot{q}_{i}(t) = \frac{d}{dt}\left(\delta q_{i}(t)\right)$$ Apologies to re-iterate, just trying to fully firm up the concept in my mind.

11. Oct 2, 2014

### "Don't panic!"

Sorry, please ignore the post above - I realised the error in what I was writing after posting it and the forum won't let me delete it now!

Instead of the above post, is the following a correct summary (pertaining to the Lagrangian and why it is dependent on position and velocity):

The state of a mechanical system at a given time, $t_{0}$ is completely specified by the positions of the particles, along with their corresponding velocities, within it. Thus, if we wish to describe the state of this system at some later time $t$ in some fixed time interval, then we need to specify how the system evolves over this interval, i.e. we require a function which depends on the in positions of the particles and also the rate at which those positions are changing (i.e. their velocities) at each point within the time interval (a requirement if we wish to consider external forces acting on the particles). This motivates us to consider a function $\mathcal{L}= \mathcal{L}\left(q_{i}(t), \dot{q}_{i}(t)\right)$ which completely specifies the state of a mechanical system at each point $t \in [t_{0},t_{1}]$.

12. Oct 2, 2014

### homeomorphic

My intuition is that the Lagrangian is sort of a cost function. You might not care about y' in some problems. So, you could imagine a problem in which your cost per unit time to travel from point A to point B in a fixed amount of time is strictly a function of position. You would then try to spend as much of your time in the areas of lower cost to minimize your travel expenses. But let's say you want to discourage speeding as well, so you want to penalize higher velocities. My intuition is that it's easier to apply that speeding penalty if you make the Lagrangian also a function of velocity. For example, you could add the speed squared or cubed or whatever you want. So, it's natural to want to introduce y' as a variable to be able to put that into your cost function. It's a pretty flexible construction, so you can imagine that we can just try to penalize any path that doesn't follow the laws of physics that we want, and hopefully, that will give you a description of physics. When you work out the details, it does turn out to work.

13. Oct 3, 2014

### Stephen Tashi

I haven't heard a mathematical answer to that question yet. Let me reiterate the question emphasizing the mathematical aspect.

When we have a function such as $y = 3x + x^2$ we denote it as $y = f(x)$, not as $y = f(x,3x,x^2)$ even though evaluation $f$ involves the intermediate steps of evaluating $3x$ and $x^2$. So why is an expression like $F(x,y,y')$ necessary in discussing the integrand in the calculus of variations? Isn't computing $y'$ from $y$ an intermediate step in the process? If we are given $y$ we can find $y', y'',...$ etc. Why not just write the integrand as $F(x,y)$ or even $F(x)$? After all, the integration $\int_a^b F(...) dx$ is ordinary integration. The integrand must be a function of $x$.

My conjecture for an explanation:

In the expression $I(y) = \int_a^b F(x,y,y') dx$ we see that it's $I(y)$ instead of $I(y,y')$ so the fact that finding $y'$ is needed as an intermediate step isn't recognized in the left hand side.

If we have function like $z = x + 3x + x^2$ we can choose to describe it in a way that exhibits intermediate calculations. For example, let $y = 3x + x^2$ and $F(a,b) = a + b$. Then we can write $z = F(x,y)$.

By analogy the notation $F(x,y,y')$ indicates a particular choice of representing the integrand that takes pains to exhibit intermediate calculations. It's not a simple algebraic expression. The computation implied by $F(x,y,y')$ is an algorithm. As far as I can see, there is nothing incorrect about notation like $I(y) = \int_a^b G(x,y) dx$ to describe the same functional. It's just that the processes described by $F$ and $G$ would be technically different. Thinking of $F$ and $G$ as computer routines, the routine $F$ requires that you compute $y'$ and then give it as input to $F$. The routine $G$ does not.

So I think the notation $F(x,y,y')$ is not a necessary notation. It is a permissible notation that may be helpful if it reminds us of the steps involved in forming the integrand.

14. Oct 3, 2014

### "Don't panic!"

Thanks for your help on the matter.
Would it be fair to say the following:

The configuration of a system at a given instant in time is completely determined by specifying the coordinates of each of the particles within the system at that instant. However, using just this information one cannot determine the configuration of the system at subsequent instants in time. To do so requires knowledge of the rate of change of these positions at the instant considered. For given values of the coordinates the system can have any velocities (as we are considering the coordinates and velocities of the particles at the same instant in time), and this will affect the configuration of the system after an infinitesimal time interval, $dt$ . Thus, by simultaneously specifying the coordinates and velocities of the particles at a given instant in time, we can, in principle, calculate it's subsequent time evolution. This means that, if the coordinates and velocities of the particles are specified at a given instant, $t_{0}$, then the accelerations of those particles are uniquely defined at that instant, enabling one to construct equations of motion for the system. Following the principle of stationary action, we are motivated to consider a function which summarises the dynamics of a physical system at each given instant in time (over some finite time interval), along all possible paths that the the system could take between two fixed configurations, $\vec{q} (t_{0})$ and $\vec{q} (t_{1})$. As such, taking into account the discussion above, we can imply that for this function to successfully summarise the dynamics of the system at each point, it is sufficient for it to be a function of the coordinates $q_{i}$ and the velocities $\dot{q}_{i} (t)$ of the constituent components of the system, i.e. a function of the form $\mathcal{L} =\mathcal{L} (q_{i} (t), \dot{q}_{i} (t)$ (we need not consider higher order derivatives as it is known that the dynamical state of the system, at a give instant in time, is completely specified by the values of its coordinates and velocities at that instant). Given this, we can then attribute a value to the dynamics of the system, depending on the path, $\vec{q} (t)= (q_{1},\ldots ,q_{n})$, that it takes between the two fixed configurations, $\vec{q} (t_{0})$ and $\vec{q} (t_{1})$. We do so by defining a functional, the action, as follows $$S[\vec{q} (t)] = \int_{t_{0}}^{t_{1}}\mathcal{L} (q_{i} (t), \dot{q}_{i} (t)) dt$$ The principal of stationary action then asserts that the actual path taken by the system between these two fixed configurations is the one for which the action is extremised (i.e. the path which gives an extremal value to this integral).

Last edited: Oct 3, 2014
15. Oct 3, 2014

### Stephen Tashi

The question I have about thr physics that followed is what does it say about the mathematical notation like $F(x,y,y')$ or $G(x,y)$ when $y$ is a function of x? Thinking of $F$ and $G$ as being implemented by computer algorithms, what does the argument $y$ represent?

One possibility is that $y$ represents a function. In many computer languages an argument can be a function instead of a single number. If we give an algorithm the ability to access the function $y(x)$ then it can in principle compute $y', y'', y'''$. This is the convention that applies to the notation $I(y)$. In that notation, $y$ represents a function.

Another possiblity is that $y$ represents a single numerical value. In that case, notation like $G(x,y)$ does not represent giving $G$ the knowledge of the function $y(x)$. So we cannot assume that the algorithm $G$ can compute $y'(x)$.

Under the convention that arguments are single numerical values then I don't see how the algorithm $F(x,y,y')$ can reconstruct any information about $y''$ (acceleration) from pure mathematics. To do that, it would have to know the behavior of $y'$ in an interval. Are you saying we have a physical situation where the knowledge of position and velocity at one point in time is sufficient to compute the subsequent behavior of the system (and hence compute any derivative of that behavior that is desired)?

( There is another recent thread where someone remarks that physicists often use ambiguous notation that makes it difficult to distinguish between a function and single numerical value that comes from evaluating that function.)

16. Oct 4, 2014

### "Don't panic!"

I was following the Landau-lifschitz book on classical mechanics to be honest, where they describe it in a similar manner. I think what is perhaps meant is that using this information as initial conditions for an equation of motion one can uniquely determine the acceleration at that initial instant?!
My thoughts were that for each possible path between to points, the lagrangian is a function of the coordinates and velocities of this path, such that, at each instant in time along the time interval the lagrangian characterises the dynamics of the system if it were to follow that path (i.e. by plugging in the values of the coordinates and velocities at each instant in time along the path into the lagrangian we can characterise the dynamics of the system along that path).

17. Oct 4, 2014

### Fredrik

Staff Emeritus
The notation $F(x,y,y')$ is pretty bad in my opinion. It should be $F(x,y(x),y'(x))$. $y$ is a function. $y(x)$ is an element of the codomain of $y$, so it's typically a number. $F$ doesn't take functions as input. It takes three real numbers.

Similarly, I would never write $S[\vec q(t)]$, because $\vec q(t)$ is an element of $\mathbb R^3$, not a function. (It's a "function of t" in the sense that its value is determined by the value of t, but it's still not a function). I would write
$$S[\vec q]=\int_a^b L(\vec q(t),\vec q'(t),t)\mathrm dt.$$ (When I do calculations with a pen and paper, I will of course abuse the notation to avoid having to write everything out). $L$ is usually something very simple. In the classical theory of a single particle moving in 1 dimension, as influenced by a potential $V:\mathbb R\to\mathbb R$, it can be defined by $L(r,s,u)=\frac{1}{2}ms^2-V(r)$ for all $r,s,u\in\mathbb R$. Note that this ensures that $L(q(t),q'(t),t)=\frac{1}{2}mq'(t)^2-V(q(t))$ for all t.

18. Oct 4, 2014

### "Don't panic!"

Exactly. That's what I was trying to allude to in my description. The Lagrangian is a function of the values of the coordinates and velocities of the particle at each given instant over the time interval considered.

Would what I said in the post (above yours) about why the Lagrangian is a function of coordinates and velocities, in a more general sense, be correct? (I know that for conservative systems it assumes the form $\mathcal{L}=T-V$, but I was trying to justify to myself the reasoning as to why we consider the Lagrangian to be a function of position and velocity in the first place, before considering any particular cases, in which the components, such as $T$ and $V$, are clearly functions of the coordinates and velocities?)

Also, is what I said about the action (in previous post), i.e. as a means of attributing a value to the characteristic dynamics of a system due to it following a particular path, $\vec{q}$, enabling us to distinguish the actual physical path taken by the system (using variational techniques), correct?

19. Oct 4, 2014

### "Don't panic!"

In reference to this part I was following Landau-Lifschitz:

"If all the coordinates and velocities are simultaneously specified, it is known from experience that the state of the system is completely determined and it's subsequent motion can, in principle, be calculated. Mathematically, this means that, if all the coordinates $q$ and velocities $\dot{q}$ are given at some instant, the accelerations $\ddot{q}$ at that instant are uniquely defined."

(Mechanics, L.D. Landau & E.M.Lifschitz)

That sounds like the theorem that says (roughly) that if f is a nice enough function, then the differential equation $\vec x''(t)=f(\vec x(t),\vec x'(t),t)$ has a unique solution for each initial condition $\vec x(t_0)=\vec x_0$, $\vec x'(t_0)=\vec v_0$.
Lagrangian mechanics is based on a slightly different theorem (I don't recall actually seeing such a theorem, but I'm fairly sure that one exists): A unique solution for each boundary condition $\vec x(t_a)=x_a$, $\vec x(t_b)=\vec x_b$.