# I Extremal condition in calculus of variations, geometric

Tags:
1. Oct 27, 2016

### jonjacson

Hi folks,

I am a bit confused with the extreme condition used in the calculus of variations:

δ = 0

I don't understand this rule to find extreme solutions (maximum or minimum)

If in normal differential calculus we have a function y = y(x) and represent it graphically, you see that at the minimum or maximum points the tangent line is horizontal. Since the angle of the tangent line is related to the derivative, it is straightforward to obtain the extreme points, we equal y' = 0. And then we check if it is a minimum, maximum or saddle point using the second derivative.

But with variations it is completely different, we have a functional, and we don't know which function minimizes or maximizes the calculation (normally a definite integral) . I have seen several approaches to find the extreme solution:

-You imagine that you know the solution, and then you express an unknown function in terms of this solution + ε *η. Where ε is a finite quantity, as small as required and η is another function. Then you use the normal differential calculus rules to find the function that minimizes-maximizes the functional, I mean you make derivative equal to zero for ε=0.
I understand what they mean but, How do I know a solution exists?

- Another reasoning talks about expanding the solution in terms of first order, second order... variations and imposing δ=0 because there are no first order changes (a similar idea to normal differential calculus but applied for the variations) for the right solution. Instead, they are second order in the right one. Or in other words "if a function is the solution of our problem, this function will have nearby many other solutions that give almost the same result".

Does anybody know any book that shows this geometrically?

The only one I found with some geometry was on Feynman Lectures on Physics but his reasoning looks really strange to me. I just copy and paste to avoid any kind of confusion on his own words:

(there is an image attached at the end of the post)

26-5 "A more precise statement of Fermat's principle"

Actually, we must make the statement of the principle of least time a little more accurately. It was not stated correctly above. It is incorrectly called the principle of least time and we have gone along with the incorrect description for convenience, but we must now see what the correct statement is. Suppose we had a mirror as in fig 1. What makes the light think it has to go to the mirror? The path of least time is clearly AB. So some people might say, "Sometimes it is a maximum time". It is not a maximum time, because certainly a curved path would take a still longer time! The correct statement is the following: a ray going in a certain particular path has the property that if we make a small change (say a one percent shift) in the ray in any manner whatever, say in the location at which it comes to the mirror, or the shape of the curve, or anything, there will be no first order change in the time; there will be only a second-order change in the time. In other words, the principle is that light takes a path such that there are many other paths nearby which take almost exactly the same time".

So he is discarding the straight line AB and saying that instead light travels from A to B passing through C... a result I can't just believe.

Honestly I think that if I have a light at point A and I turn on it pointing to the point B, the light is not going to go through C, it will take the straight line AB.

Assuming that mysteriously the light decides suddenly going to the mirror, reflecting at point C and going finally to point B.

Aren't the paths nearby the straight line AB taking almost exactly the same time to travel that AB?

If you think only about this "nearby paths" reasoning I can't even see any difference between any paths. Since all of them are straight lines with constant speed, small changes on the paths will make small changes in the times of travel, and that will be the same for AB, or for ACB, or for AEB, or for ADB, so why is that δ=0 gives us the right (according to Feynman) result ACB instead of AB or even any other one?

#### Attached Files:

• ###### Untitled.png
File size:
8.6 KB
Views:
47
Last edited: Oct 27, 2016
2. Oct 28, 2016

### BvU

Hi,

From the picture it's clear to me that ACB' is shorter than AEB' and therefore ACB is shorter than AEB. What do you say ?
And at C you have a case $\delta = 0$ which you do not have at D or E.

Note that $\delta$ is not a number but a deviation at any point along the path. You can use the same reasoning to establish that a straight line AC is the shortest path from A to C.

3. Oct 28, 2016

### jonjacson

(If you don't have time you could go to point 5 in my answer, the last one, maybe answering that question I understand all the other ones)

First of all, thanks for your answer, this topic is very important for me. If I understand this then I can understand all the following theory easily, but I don't get the point. It is not clear to me even what δ is. So I want to clarify two ideas: what δ really is, and why δ=0 means you get the correct function.

1.- According to Feynman the time of travel in the path AB is even shorter than ACB but AB it is NOT the right answer, even if it takes the shortest time.
An statement I think is false. What do you think?

According to him it is not all about shortest time, the only relevant thing is that the paths nearby a solution take almost the same time as the real solution. I can't see why AB does NOT meet this criteria and ACB does.

Here you have in case you don't believe me:

http://www.feynmanlectures.caltech.edu/I_26.html

Point 26.5 at the end. He says clearly the least time path is AB but instead the light prefers going to the mirror, discarding AB as the right answer because the fact that it takes the minimum time doesn't matter, what matters for Feynman is what happens for the paths in the neighborhood. I don't know if every day I get a little bit more stupid or what, but this looks just completely absurd to me.

2- Let's talk about what δ is.

In the first work of Euler I read, he broke an hipothetic curve into several straight lines. Assuming the function does change only in one of its vertex, the variation of the y value is the difference between the value of the new function and the old one for the same x.

So now x, the independent variable is fixed, we are changing the function itself from F1 (x) to F2(x), where 1 and 2 only are the name of the functions and X0 is fixed. These two functions only differ at the point x0, so we have:

δy= F2 (x0) - F1 (x0)

That is the variation of the function sought, that will make the variation of the definite integral U =∫ y(x) dx , a minimum or maximum. So we have in one side the variations of the functions that we are searching for, these functions are y=y(x), and then we have the variation of the definite integral U. So if we use y1 in the integral we get U1, if we put y2 in the integral we get U2, we define as δU = U2 - U1 and we say that for δU=0 gives us the right extreme solution, the function y=y(x).

I don't understand why δU=0 gives us the right answer.

http://www.17centurymaths.com/contents/Euler'smaxmin.htm (Thanks to Ian Bruce for his translations)

It is the document E296, the penultimate document, in the section "THE APPLICATION OF THE CALCULUS OF VARIATIONS TO THE SOLUTION OF ISOPERIMETRIC PROBLEMS TO BE TAKEN OF THE BROADEST POSSIBLE SIGNIFICANCE" , the point 62 is really difficult to understand but 63 is a lot more clear, I copy and paste:

"Some proposed formula U defines that relation between the two variables x and y, by which if the value of U may be determined and which may be extended from the value x = 0 as far as to the value x = a , that function is going to give rise to a maximum or minimum.

Therefore we may consider the relation between x and y now as found, thus so that a maximum or minimum value of U may arise ; and it is clear, if the relation between x and y may be changed an infinitely small amount, thence no change must arise in the value of U ; or, what amounts to the same, a variation of U or δU is required equal to zero ; and thus the equation δU=0 includes the relation sought between x and y."

For me, this is not clear.

I get it for the normal differential calculus but not for the variational calculus.

In normal differential calculus a differential change on the independent variable x changes very little, or nothing, the value of the function y because we can approximate the function y by an horizontal straigh line (near the extreme point), that means moving a little the value of x doesn't change the value of y. In the image I attach as x1 gets closer to x, ymax-yx1 →0, so I understand a differential change on x doesn't change at all the value of y if we only increase x by dx.

The important thing is that for another value of x like x2, a change to x2+dx does change the value of the function because it can't be approximated by an horizontal line, the approximated line is inclined and obviously forces the value of y to change even if it is by a small amount.

But what happens in variational calculus?

We are comparing now two curves, a completely different geometric problem. Let's say we search the shortest path from point A to point B.

If we make a differential change in the form of a curve (even if it is differential), it always forces a change in the lenght of the curve, it is NOT cero.
We have two points A and B, connected by the straight constant line y = constant (I call this value c1). Imagine we have another path, in red, close to it that only differs by a differential amount at point x=a+dx---> y = c1+dy . The lenght of both lines is similar, but not equal. And that happens for the shortest line and for any other one, the lenght always changes.

3.- And by the way, there is an infinite amount of functions y=y(x) that are compatible with δU=0 since if you compare any function with himself the variation is always zero. So even the most bizarre curve connecting points A and B meets δU=0 since the very same curve gives δU=0. Why is that δU=0 is only selecting one solution?

In other words, if I give you a curve y1 and I ask you:

What curve meets the requirement δy=0?

Obviously the same one, y1.

4.- Coming back to your post you will understand why I ask you these next two questions, you said:

"From the picture it's clear to me that ACB' is shorter than AEB' and therefore ACB is shorter than AEB."

I guess ACB' is a close path to ACB, changed by a variation and AEB' is a close path to AEB, also obtained through a variation. I wonder why do you compare both?

I mean talking in terms of variations:

ACB---> δU1= ACB' - ACB

AEB--->δU 2= AEB' - AEB

If you talk about the lenght of ACB' and AEB' it is clear which one is longer, but when you say δU=0, What two curves are you comparing? I thought I had to compare ACB' with ACB, and AEB' with AEB.

5.- "And at C you have a case $\delta = 0$ which you do not have at D or E."

It would be great if you explain to me this. Can you demonstrate it?

Last edited: Oct 28, 2016
4. Oct 29, 2016

### jonjacson

I was reading HIlbert and Courant methods for physics and they say clearly in a variational problem you don't know if you are going to get an answer. Interestingly they avoid the equation δ=0, they first develop direct methods, and then indirect methods (euler equation) and they describe those methods using the trick of introducing an hypothetic solution and a variation in this way: Y = y + εη.

Last edited: Oct 29, 2016
5. Nov 5, 2016

### Stephen Tashi

In the usual scenario for the calculus of variations, we are trying to minimized "a function of a function". For example, we might have $U(f) = \int_{0}^{2} sin( f(x)) dx$ subject to "side conditions" like $f(0) = 0$ and $f(1) = 5$. The calculus of variations does not solve this minimization problem over the set of "all possible functions". It solves it over a set of functions that are, in some sense, "smooth" - e.g. we might solve it over the set of functions that are twice differentiable.

The technique of the calculus of variations begins by rephrasing the problem so it becomes (the familiar!) problem of finding the extrema of real valued function of a real variable.

If we make a naive attempt to do this, we might restrict our attention to a family of functions such as $f(x,\epsilon) = 5x + \epsilon(x - x^2)$ that is parameterized by the single parameter $\epsilon$. Then the function $U(f)$ can be regarded as function of one real variable real variable $U(\epsilon) = \int_{0}^{2} (sin(x + \epsilon (x- x^2) )dx$. We can seek the extrema of $U(\epsilon)$ in the usual manner by solving the equation $\frac{dU}{d \epsilon} = 0$.

The naive approach considers a very restricted family of functions. The approach of the calculus of variations is more general. However it begins with some assumptions that , at first glance, would seem to lead nowhere.

The first assumption is that we pretend to know the function $f_0(x)$ that minimizes $U(f)$. The second assumption is that we restrict our solutions to a family of functions of the form $f_0(x) + \epsilon v(x)$ where $\epsilon$ is a real number and $v(x)$ is any function that keeps $f_0(x) + \epsilon v(x)$ in the set of functions we wish to consider. For example, if we are minimizing over the set of twice differentiable functions, then $v(x)$ needs to be twice differentiable.

With those assumptions the problem becomes that of minimizing the real valued function $U(\epsilon)$. We do this by setting $\frac{dU}{d\epsilon} = 0$. This approach seems crazy for two reasons. First, we have already assumed the answer to the problem is the function $f_0$, so we already know the solution to $\frac{dU}{d\epsilon} = 0$ must be $\epsilon = 0$. Second, when we compute $\frac{dU}{d\epsilon}$ the result involves the functions $f_0(x)$ and $v(x)$ which we don't actually know.

The miracle of the calculus of variations is that analyzing the equation $\frac{dU}{d\epsilon} = 0$ still gives some useful information, even though it contains the "purely symbolic" functions $f_0(x)$ and $v(x)$.

Notice we only know that $\epsilon = 0$ is a solution to $\frac{dU}{d\epsilon} = 0$ when we stipulate that our family of functions is given by $f_0(x) + \epsilon v(x)$. If we had parameterized the family as $f(x) + \epsilon v(x)$ where $f(x)$ was not the function that minimizes $U(f)$ then we would not know that $\epsilon = 0$ is a solution.

In the Euler's work, perhaps the significance of "$\delta U = 0"$ involves a similar assumption about how $U$ is defined.

6. Nov 6, 2016

### jonjacson

Yes, it looks quite crazy as it seems you know nothing about the problem but you still can get very useful information out of it.

7. Nov 6, 2016

### Stephen Tashi

This pdf describes Euler's original "geometric" approach to the calculus of variations, and gives a figure which resembles the one you gave above. The function F is assumed to be the solution of the problem of minimizing the integral. http://www.eftaylor.com/pub/HancEulerEJP.pdf So F is not just any arbitrary function.

It's interesting how continuity plays a role. We don't merely change an single isolated value of the function when we do a "variation" from it. To make the graph of the new function, we have to have segments connecting the displaced point to the original curve of F.

8. Nov 7, 2016

### jonjacson

Good article, thanks!