# Lagrangian: q and q-dot independence

• pccrp
In summary, a coordinate chart specifies the possible states of a particle and the coordinates are independent.f

#### pccrp

Hello! I've read thousand of explanations about how q and q-dot are considered independent in the Lagrangian treatment of mechanics but I just can't get it. I would really appreciate if someone could explain how is this so and (I've seen something about an a-priori independence but I couldn't really understand it) prove Lagrangian equations of motion showing how this independence works. Thanks

Consider a particle moving completely freely. ##q_1,q_2,q_3## and ##\dot{q}_1,\dot{q}_2,\dot{q}_3## are coordinates of ##\mathbb{R}^{6}## which specify the possible states of the particle. As you know, a set of coordinates ##(x^i)## are independent of each other i.e. ##\frac{\partial x^{i}}{\partial x^{j}} = \delta^{i}_{j}##.

Consider a particle moving completely freely. ##q_1,q_2,q_3## and ##\dot{q}_1,\dot{q}_2,\dot{q}_3## are coordinates of ##\mathbb{R}^{6}## which specify the possible states of the particle. As you know, a set of coordinates ##(x^i)## are independent of each other i.e. ##\frac{\partial x^{i}}{\partial x^{j}} = \delta^{i}_{j}##.

I understand they're all needed to specify the state of the system. However, how can you start from this and prove the equations?

$$\vec{r_i} = \vec{r_i}(q) \\ \dot{\vec{r_i}} = \sum_j \frac {\partial \vec{r_i}} {\partial q_j}(q) \dot{q_j} \\ \frac {\partial \dot{\vec{r_i}}} {\partial \dot{q_j}} = \frac {\partial \vec{r_i}} {\partial q_j} \\ \frac {\partial T} {\partial \dot{q_j}} = \sum_i m_i \dot{\vec{r_i}} \cdot \frac {\partial \dot{\vec{r_i}}} {\partial \dot{q_j}} = \sum_i m_i \dot{\vec{r_i}} \cdot \frac {\partial \vec{r_i}} {\partial q_j} \\ \frac {d} {dt} \frac {\partial T} {\partial \dot{q_j}} = \sum_i m_i \ddot{\vec{r_i}} \cdot \frac {\partial \vec{r_i}} {\partial q_j} + \sum_i m_i \dot{\vec{r_i}} \cdot \frac {\partial \dot {\vec{r_i}}} {\partial q_j} = \sum_i \vec{F_i} \cdot \frac {\partial \vec{r_i}} {\partial q_j} + \frac {\partial T} {\partial q_j} = Q_j + \frac {\partial T} {\partial q_j} \\ \frac {d} {dt} \frac {\partial T} {\partial \dot{q_j}} - \frac {\partial T} {\partial q_j} = Q_j \\ \vec{F_i} = - \frac {\partial \Pi} {\partial \vec {r_i}} \\ \frac {\partial \Pi} {\partial q_j} = \sum_i \frac {\partial \Pi} {\partial \vec {r_i}} \cdot \frac {\partial \vec{r_i}} {\partial q_j} = - \sum_i \vec{F_i} \cdot \frac {\partial \vec{r_i}} {\partial q_j} = - Q_j \\ \frac {d} {dt} \frac {\partial T} {\partial \dot{q_j}} - \frac {\partial T} {\partial q_j} = - \frac {\partial \Pi} {\partial q_j} \\ \frac {d} {dt} \frac {\partial T} {\partial \dot{q_j}} - \frac {\partial (T - \Pi)} {\partial q_j} = 0 \\ L = T - \Pi \\ \frac {d} {dt} \frac {\partial L} {\partial \dot{q_j}} - \frac {\partial L} {\partial q_j} = 0$$

pccrp, I have exactly the same problem as you, and think that many of those who try to answer the question don't really understand the problem. I think I solved it, however, and in this old thread

I both try to formulate the problem and solve it. The problem is that both formulating the problem and solving it required extensive notation which might be hard to penetrate, and still, those who replied didn't see the problem.

There is nothing deep here whatsoever; it's just math. If ##M## is the configuration space, and ##TM## is the tangent bundle of the configuration space, then the Lagrangian is a map ##L: TM \times \mathbb{R} \rightarrow \mathbb{R}##. The coordinate charts for ##TM## have the generalized coordinates ##q^i## and generalized velocities ##\dot{q}^i## as coordinate functions i.e. ##p\in TM## can be represented as ##p = (q^1,...,q^n,\dot{q}^1,...,\dot{q}^n)## with respect to some coordinate chart. Then ##L## is just a map that takes ##(q^1,...,q^n,\dot{q}^1,...,\dot{q}^n,t)## and gives us a real number. Coordinate functions are of course independent of each other.

There is nothing deep here whatsoever; it's just math. If ##M## is the configuration space, and ##TM## is the tangent bundle of the configuration space...
OK, but perhaps not everyone knows what a tangent bundle is. I think my explanation in my post (referred to above) is actually based on the same idea as yours, but is expressed differently. The important thing to note is that for every 2n+1-tuple of numbers ##(a_1,a_2,\dots,a_n,b_1,b_2,\dots,b_n,c)## (perhaps within some given boundaries), there is a path ##(q_1(t),q_2(t),\dots,q_n(t))## in configuration space such that ##q_i(c)=a_i## and ##\dot q_i(c)=b_i## for ##i=1,2,\dots,n##.

You don't need fiber bundles to understand this. Let's just work on the space $\mathbb R^3\times \mathbb R^3 = \mathbb R^6$ of coordinates and velocities. The Lagrangian is just a function $L:\mathbb R^6\rightarrow \mathbb R$. It's value at a point is better denoted by $L(x,v)$ instead of $L(q,\dot q)$. Then the Euler-Lagrange equations read
$$\frac{\mathrm d}{\mathrm d t} \left(\frac{\partial L(x,v)}{\partial v}\bigg|_{x=q(t),v=\dot q(t)}\right) - \frac{\partial L(x,v)}{\partial x}\bigg|_{x=q(t),v=\dot q(t)}=0 \text{ .}$$
This makes all the dependences obvious. However, it's just more convenient to write
$$\frac{\mathrm d}{\mathrm d t} \frac{\partial L}{\partial \dot q} - \frac{\partial L}{\partial q} =0$$
instead, although it might cause confusion. You just have to keep in mind that it really means the above equation. The formulation using fiber bundles just generalized this to other spaces than $\mathbb R^6$.

Last edited:
You don't need fiber bundles to understand this. Let's just work on the space $\mathbb R^3\times \mathbb R^3 = \mathbb R^6$ of coordinates and velocities. The Lagrangian is just a function $L:\mathbb R^6\rightarrow \mathbb R$. It's value at a point is better denoted by $L(x,v)$ instead of $L(q,\dot q)$. Then the Euler-Lagrange equations read
$$\frac{\mathrm d}{\mathrm d t} \left(\frac{\partial L(x,v)}{\partial v}\bigg|_{x=q(t),v=\dot q(t)}\right) - \frac{\partial L(x,v)}{\partial x}\bigg|_{x=q(t),v=\dot q(t)}=0 \text{ .}$$
As I explained in the old thread, the problem is that for this to be meaningful, the function ##L(x,v)## must be unique. If there was another function ##M(x,v)## such that ##L(q(t),\dot q(t))=M(q(t),\dot q(t))## for all paths ##q(t)##, but for which ##\partial L /\partial v\neq\partial M/\partial v##, then we wouldn't know which one of these expressions to use.

Therefore, it is important to prove that ##L(x,v)## is unique.

Anyone who is confused by this is in good company - Bill Burke dedicates his book on applied differential geometry to "all those who, like me, have wondered how in hell you can change ##\dot q## without changing ##q##".

As I explained in the old thread, the problem is that for this to be meaningful, the function ##L(x,v)## must be unique. If there was another function ##M(x,v)## such that ##L(q(t),\dot q(t))=M(q(t),\dot q(t))## for all paths ##q(t)##, but for which ##\partial L /\partial v\neq\partial M/\partial v##, then we wouldn't know which one of these expressions to use.

Therefore, it is important to prove that ##L(x,v)## is unique.

$L(x,v)$ is given as an axiom. It completely specifies your theory. For example $L(x,v)=\frac{1}{2}m v^2 - m g x$ describes a falling particle. You don't need to prove any uniqueness properties. In fact, it is never unique. Just try $L'(x,v)=\frac{1}{2}m v^2 - m g x + v$ for example. There are always other $L'(x,v)$ that give you exactly the same equations of motion. It doesn't matter which one you choose.

Everything is well-defined the way it is usually taught. For a given $L(x,v)$, you just compute the partial derivatives of $L(x,v)$, plug in $q(t)$ and $\dot q(t)$ afterwards and then insert them into the Euler-Lagrange equations. Apart from technical conditions like differentiability, you don't need to worry about anything.

Fiber bundles? Geeze...can we make this any more complicated?

Don't think about the mathematics. Think about the physics. If q and q-dot are dependent, that means that every time a particle is in a given position, it has the same velocity. While there are problems where that is true, do you want those to be the only kind of problems you can solve?

Fiber bundles? Geeze...can we make this any more complicated?

Don't think about the mathematics. Think about the physics. If q and q-dot are dependent, that means that every time a particle is in a given position, it has the same velocity. While there are problems where that is true, do you want those to be the only kind of problems you can solve?

I wrote: "You don't need fiber bundles..."

For a given trajector $q(t)$, $q$ and $\dot q$ are related. Just try $q(t)=t^2$. Then $q = \frac{{\dot q}^2}{4}$. The point is that this is irrelevant for the Euler-Lagrange equations, because neither $q$ nor $\dot q$ gets differentiated with respect to the other variable.

$L(x,v)$ is given as an axiom. It completely specifies your theory. For example $L(x,v)=\frac{1}{2}m v^2 - m g x$ describes a falling particle. You don't need to prove any uniqueness properties. In fact, it is never unique. There are always other $L'(x,v)$ that give you exactly the same equations of motion. It doesn't matter which one you choose.

Everything is well-defined the way it is usually taught. For a given $L(x,v)$, you just compute the partial derivatives of $L(x,v)$, plug in $q(t)$ and $\dot q(t)$ afterwards and then insert them into the Euler-Lagrange equations. Apart from technical conditions like differentiability, you don't need to worry about anything.
##L=T-V## and there is, for a given potential, a given formula to calculate this in cartesian coordinates, like the one you gave, and this can be taken as an axiom, yes. We then use the coordinate transformation to rewrite this formula in the generalized position and velocity coordinates, and here lies the problem. For how can we know that the cartesian velocities can be uniquely expressed as functions of the generalized velocities and positions? The formulas giving these expressions are not taken as axioms, they are derived in way which not shows that they are unique. Therefore, this uniqueness must be proved.

Again, I refer to this old thread for the details:

We then use the coordinate transformation to rewrite this formula in the generalized position and velocity coordinates

No, you don't perform any coordinate transformations. If you have $L(x,v)$, the choice of coordinates has already been made and isn't changed anymore. $x$ and $v$ are already the generalized coordinates. I didn't mean to imply cartesian coordinates when i wrote $x$. I just wanted to distinguish it symbolically from the trajectory $q(t)$.

(It's unfortunate that in the case of $TM=\mathbb R^{2N}$, the usual coordinate chart already is $(\mathbb R^{2N},\mathrm{id})$. This obfuscates what's going on a little bit. Actually, everything can even be formulated completely coordinate free. One should distinguish $L:TM\rightarrow \mathbb R$ from $L\circ f^{-1}:U\rightarrow\mathbb R$, where $(U,f)$ is a coordinate chart for $TM$. The $L(x,v)$ I'm talking about all the time, is really some $L\circ f^{-1}$. This is way too complicated however, if you just work in $\mathbb R^{2N}$.)

pccrp, I have exactly the same problem as you, and think that many of those who try to answer the question don't really understand the problem. I think I solved it, however, and in this old thread

I both try to formulate the problem and solve it. The problem is that both formulating the problem and solving it required extensive notation which might be hard to penetrate, and still, those who replied didn't see the problem.

Gathering answers around books and counting on all your greatly helpful answers (thanks, by the way), I successfully got to a conclusion and I would really appreciate if you could say to me if that's true or not.

In my head, it's just a mathematical reason that you can consider them as independent. For example, suppose there's a function $$f(y(x),y'(x))= y + y'$$ where $y=x^2 \rightarrow y'=2x$
If we evalute $f(y,y')$ in function of $x$ only, we'll have $f(x)=x^2+2x$; If we differentiate it w.r.t $x$ we get $f'(x)=2x+2$

Simirlarly, if we consider $y(x)$ and $y'(x)$ as independent variables and use the chain rule to differentiate $f(y,y')$ w.r.t $x$ we'll have: $$\frac{df(y,y')}{dx}=\frac{\partial f}{\partial y} \frac{dy}{dx}+\frac{\partial f}{\partial y'} \frac{dy'}{dx}$$ Evaluating each term, we have $$\frac{\partial f}{\partial y}=1 ;$$$$\frac{\partial f}{\partial y'}=1;$$$$\frac{dy}{dx}=y'(x)=2x;$$$$\frac{dy'}{dx}=\frac{d(2x)}{dx}=2;$$

Which, by substitution, gives:$$\frac{df(y,y')}{dx}=1(2x)+1(2)=2x+2$$
As we can see, the same as the answer previously calculated. This shows (but not proves) that we can consider them as independent and with this result we see that$\dot{\vec{r_i}} = \sum_j \frac {\partial \vec{r_i}} {\partial q_j} \dot{q_j}$ can be considered as a function of independent variables $q_j(t)$ and $\dot{q_j}(t)$ even though we know they're both functions of the independent variable $t$ and that there's a relation of dependence between them:$$\dot{\vec{r_i}}=\dot{\vec{r_i}}(q,\dot{q})$$ Being so, we can do like in the example and partially differentiate it w.r.t. to $\dot{q_j}$ considering that $q_j$ are constants. With this result, it becomes possible to prove Lagrange's equation.

Please, correct me if I'm wrong and, if possible, redirect me to a proof of the identity I've shown an example.

Last edited:
Gathering answers around books and counting on all your greatly helpful answers (thanks, by the way), I successfully got to a conclusion and I would really appreciate if you could say to me if that's true or not.

In my head, it's just a mathematical reason that you can consider them as independent.

I'm not sure whether you understood it. We don't "consider anything independent". In fact, given a trajectory, $q$ and $\dot q$ are not independent in general as the simple example $q(t)=t^2$ shows. The physical intuition behind this is: A particle usually has a different velocity at each point of the trajectory.

The problem many people are having (and I think you are having, too) is: When we evaluate $\frac{\partial L}{\partial q}$ and $\frac{\partial L}{\partial \dot q}$ in the Euler-Lagrange equations, why don't we need to do something like this:
$$\frac{\mathrm d L}{\mathrm d q} = \frac{\partial L}{\partial q} + \frac{\partial L}{\partial \dot q} \frac{\partial \dot q}{\partial q}$$
And the answer is that the usual way the Euler-Lagrange equations are written is a little bit of an abuse of notation. $\frac{\partial L(q,\dot q)}{\partial q}$ isn't to be interpreted as
$$\frac{\mathrm d L(q,\dot q(q))}{\mathrm d q} \text{ .}$$
It really means
$$\frac{\partial L(x,v)}{\partial x}\bigg|_{x=q(t),v=\dot q(t)} \text{ .}$$
The same goes for $\frac{\partial L}{\partial \dot q}$. At no point does $q$ need to be differentiated with respect to $\dot q$ (or the other way around). Thus it is irrelevant whether they are really dependent or not. You just differentiate $L$ with respect to its arguments and afterwards insert $q$ and $\dot q$. This is very important.

I'm not sure whether you understood it. We don't "consider anything independent". In fact, given a trajectory, $q$ and $\dot q$ are not independent in general as the simple example $q(t)=t^2$ shows. The physical intuition behind this is: A particle usually has a different velocity at each point of the trajectory.

The problem many people are having (and I think you are having, too) is: When we evaluate $\frac{\partial L}{\partial q}$ and $\frac{\partial L}{\partial \dot q}$ in the Euler-Lagrange equations, why don't we need to do something like this:
$$\frac{\mathrm d L}{\mathrm d q} = \frac{\partial L}{\partial q} + \frac{\partial L}{\partial \dot q} \frac{\partial \dot q}{\partial q}$$
And the answer is that the usual way the Euler-Lagrange equations are written is a little bit of an abuse of notation. $\frac{\partial L(q,\dot q)}{\partial q}$ isn't to be interpreted as
$$\frac{\mathrm d L(q,\dot q(q))}{\mathrm d q} \text{ .}$$
It really means
$$\frac{\partial L(x,v)}{\partial x}\bigg|_{x=q(t),v=\dot q(t)} \text{ .}$$
The same goes for $\frac{\partial L}{\partial \dot q}$. At no point does $q$ need to be differentiated with respect to $\dot q$ (or the other way around). Thus it is irrelevant whether they are really dependent or not. You just differentiate $L$ with respect to its arguments and afterwards insert $q$ and $\dot q$. This is very important.

Adapting my thoughts, the terms $\frac{\partial L}{\partial q_j}$ and $\frac{\partial L}{\partial \dot q_j}$ (where the $\frac{\partial L}{\partial q_j}$ treats $\dot q_j$ as constants and vice versa) appear in the Lagrangian equations of motion because when proving the from Hamilton's principle, the Taylor expansion for $L+\delta L=L(q(t)+\delta q(t),\dot {q} + \delta \dot{q}, t)$ does not make any distinction if $q$ and $\dot q$ are or aren't independent from each other. Being so, this expansion can always be written $$L(q(t)+\delta q(t),\dot {q}(t) + \delta \dot{q}(t), t)= L(q(t), \dot{q}(t), t) +\sum_{j} \frac{\partial L}{\partial q_j}\delta q_j(t) + \sum_{j} \frac{\partial L}{\partial \dot{q_j}}\delta \dot{q_j}(t)$$
And ,if you apply this expansion to Hamilton's principle and manipulate it algebrically (recognizing that $\dot {q_j}=\frac{dq_j}{dt}$), you'll get the $$\frac{\mathrm d}{\mathrm d t} \frac{\partial L}{\partial \dot q} - \frac{\partial L}{\partial q} =0$$ Since in the start of the demonstration the $\frac{\partial L}{\partial q_j}$ treats $\dot q_j$ as constants and vice versa, they'll keep this behavior on the Lagrange's equations.
Am I correct? Thanks for your help

Since in the start of the demonstration the $\frac{\partial L}{\partial q_j}$ treats $\dot q_j$ as constants and vice versa, they'll keep this behavior on the Lagrange's equations.
Am I correct? Thanks for your help

Yes, this is the idea. However, I'm not happy with phrases like "consider as independent" and "treat as constants". This sounds like one could arbitrarily choose how to interpret the derivatives. That's not the case, though. You can prove the Euler-Lagrange equations with full rigour by strictly applying the rules of calculus. There is no freedom how to interpret terms.

Here is how I would derive the Euler-Lagrange equations (leaving out all technicalities for the sake of simplicity):
We want to find the trajectory ##q(t)## that makes the action
$$S[q] = \int_{t_a}^{t_b} L(q(t),\dot q(t))\mathrm d t$$
stationary. A necessary condition for this to be true is that whenever we add a multiple of some arbitrary function ##\eta(t)## with ##\eta(t_a) = \eta(t_b) = 0## to ##q(t)##, ##S[q+\epsilon\eta]## shouldn't change much for small ##\epsilon##. Since, given fixed ##q## and ##\eta##, ##S[q+\epsilon\eta]## is just a real-valued function of the real parameter ##\epsilon##, we can state this as
$$\frac{\mathrm d}{\mathrm d \epsilon}\bigg|_{\epsilon=0} S[q+\epsilon\eta] =0 \text{ .}$$
Now we can insert the definition of ##S[q]## and (assuming everything behaves nicely) move the derivative under the integral:
$$\frac{\mathrm d}{\mathrm d \epsilon}\bigg|_{\epsilon=0} S[q+\epsilon\eta] = \int_{t_a}^{t_b} \frac{\partial}{\partial\epsilon}\bigg|_{\epsilon=0} L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))\mathrm d t$$
Note that the derivative acts on a function of the form ##f(g(\epsilon,t),h(\epsilon,t))##, so we can just apply the chain rule:
$$\frac{\partial}{\partial\epsilon}\bigg|_{\epsilon=0} L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t)) \\= \left[\frac{\partial L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))}{\partial (q(t)+\epsilon\eta(t))}\frac{\partial (q(t)+\epsilon\eta(t))}{\partial\epsilon}+\frac{\partial L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))}{\partial (\dot q(t)+\epsilon\dot\eta(t))}\frac{\partial (\dot q(t)+\epsilon\dot\eta(t))}{\partial\epsilon}\right]\bigg|_{\epsilon=0}\\=\frac{\partial L(q(t),\dot q(t))}{\partial q(t)}\eta(t)+\frac{\partial L(q(t),\dot q(t))}{\partial \dot q(t)}\dot\eta(t)$$
Now you just need to put this back into the integral, use the standard integration by parts trick, make sure the boundary term vanishes and derive the Euler-Lagrange equations, using the fact that it should hold for all ##\eta##. I have abused notation a little bit, but I hope it is clear how this is to be understood.

Last edited:
No, you don't perform any coordinate transformations. If you have $L(x,v)$, the choice of coordinates has already been made and isn't changed anymore. $x$ and $v$ are already the generalized coordinates. I didn't mean to imply cartesian coordinates when i wrote $x$. I just wanted to distinguish it symbolically from the trajectory $q(t)$
OK, maybe you don't always have to invoke cartesian coordinates, but in most examples given in textbooks, such as Goldstein, this is how it is done.
Suppose, for example, we have a particle forced to move on a surface located in a gravity field. We then express the motion of the particle in terms of two parameters on the surface. Then, how can we express L=T-V in terms of the surface parameters and its time derivatives for the path?
The only way I know of is to express the kinetic and potential energy of the particle in cartesian coordinates and then transform to the surface parameters. This method is anyway implicit in the derivation in Goldstein.

Last edited:
OK, maybe you don't always have to invoke cartesian coordinates, but in most examples given in textbooks, such as Goldstein, this is how it is done.
Suppose, for example, we have a particle forced to move on a surface located in a gravity field. We then express the motion of the particle in terms of two parameters on the surface. Then, how can we express L=T-V in terms of the surface parameters and its time derivatives for the path?
The only way I know of is to express the kinetic and potential energy of the particle in cartesian coordinates and then transform to the surface parameters. This method is anyway implicit in the derivation in Goldstein.

It seems like you are talking about a different problem than the OP. The OP wants to know how to properly derive the Euler-Lagrange equations, given a Lagrangian. You are asking however, how to find a Lagrangian that describes the motion on a constraint surface in $\mathbb R^N$, given an unconstrained Lagrangian ##L:T\mathbb R^N\rightarrow\mathbb R##. In that case you need an embedding ##\varphi:M\rightarrow\mathbb R^{2N}## of your constraint surface ##M## into ##\mathbb R^N## (##M## could be given by ##M=\{(x,y)\in\mathbb R^2:x^2+y^2=1\}## and ##\varphi## would just be the identity of ##\mathbb R^2## restricted to ##M## for example). Your embedding ##\varphi## induces a map ##\varphi_*:TM\rightarrow T\mathbb R^2## (intuition: the set of tangent vectors to the constraint surface is a subset of all tangent vectors of ##\mathbb R^N##). Now we can write down a Lagrangian on ##L_M:TM\rightarrow\mathbb R## by concatenation of the maps we have: ##L_M = L\circ\varphi_*##. This is just a fancy way of saying that we restrict the domain of the Lagrangian ##L## to the vectors that are tangent to ##M##. You just need to make sure that ##M## really is a manifold in order to make the formalism work. But now you have a Lagrangian on ##TM## and you can do all the things you want, like choosing a coordinate system on ##TM## and writing down the Euler-Lagrange equations.

Gathering answers around books and counting on all your greatly helpful answers (thanks, by the way), I successfully got to a conclusion and I would really appreciate if you could say to me if that's true or not.

In my head, it's just a mathematical reason that you can consider them as independent. For example, suppose there's a function $$f(y(x),y'(x))= y + y'$$ where $y=x^2 \rightarrow y'=2x$
If we evalute $f(y,y')$ in function of $x$ only, we'll have $f(x)=x^2+2x$; If we differentiate it w.r.t $x$ we get $f'(x)=2x+2$

Simirlarly, if we consider $y(x)$ and $y'(x)$ as independent variables and use the chain rule to differentiate $f(y,y')$ w.r.t $x$ we'll have: $$\frac{df(y,y')}{dx}=\frac{\partial f}{\partial y} \frac{dy}{dx}+\frac{\partial f}{\partial y'} \frac{dy'}{dx}$$ Evaluating each term, we have $$\frac{\partial f}{\partial y}=1 ;$$$$\frac{\partial f}{\partial y'}=1;$$$$\frac{dy}{dx}=y'(x)=2x;$$$$\frac{dy'}{dx}=\frac{d(2x)}{dx}=2;$$

Which, by substitution, gives:$$\frac{df(y,y')}{dx}=1(2x)+1(2)=2x+2$$
As we can see, the same as the answer previously calculated. This shows (but not proves) that we can consider them as independent and with this result we see that$\dot{\vec{r_i}} = \sum_j \frac {\partial \vec{r_i}} {\partial q_j} \dot{q_j}$ can be considered as a function of independent variables $q_j(t)$ and $\dot{q_j}(t)$ even though we know they're both functions of the independent variable $t$ and that there's a relation of dependence between them:$$\dot{\vec{r_i}}=\dot{\vec{r_i}}(q,\dot{q})$$ Being so, we can do like in the example and partially differentiate it w.r.t. to $\dot{q_j}$ considering that $q_j$ are constants. With this result, it becomes possible to prove Lagrange's equation.

Please, correct me if I'm wrong and, if possible, redirect me to a proof of the identity I've shown an example.
I say yes, this is the way one must think. But there is a problem which many people don't seem to see: the problem of uniqueness.

Using your example, what if we start from ##x^2+2x## and want to retrieve ##f##? It is clear that there are many (in fact, infinitely many) functions ##f(y,y')## such that ##f(x^2,2x)=x^2+2x)##. ##f(y,y')=y+y'## is only one of them, another one is e.g. ##g(y,y')=\sqrt y(\frac {y'}2+2)## (assuming that ##x>0##).

Clearly, ##\frac d{dx}f(x^2,2x)=\frac d{dx}g(x^2,2x)=2x+2##, but ##\partial g/\partial y=\frac1{2\sqrt y}(\frac {y'}2+2)## and ##\partial g/\partial y'=\frac{\sqrt y}2##, very different from ##\partial f/\partial y## and ##\partial f/\partial y'##. But of course, if we calculate ##\frac d{dx}g(x^2,2x)## with the chain rule using these partial derivatives, the result is still ##2x+2##.
But in Langrage's equations, expressions similar to these partial derivatives are used, not only as intermediates, so how can we then know if should use the partial derivatives of ##f## or ##g## or of some other of infinitely many possible functions?

The answer is that we need a function ##f## which works, not only for ##y(x)=x^2##, but for all choices of the function (or path) ##y(x)##, and then, there is only one possible function ##f## (this is better explained in the other thread referred to in other posts).

But there is a problem which many people don't seem to see: the problem of uniqueness.

Really, there definitely is no problem of uniqueness here. Given a constraint surface ##M\subset\mathbb R^N## and a Lagrangian ##L## on ##\mathbb R^N##, the Lagrangian for ##M## is uniquely defined by composing ##L## with the inclusion map ##TM\hookrightarrow\mathbb R^{2N}##. This is centuries old mainstream math. You are misunderstanding something and I'm not sure what exactly. Everything comes down to applying the chain rule correctly.

The OP isn't even dealing with constraints here. He has been given a Lagrangian that he wants to work with, so there is no need to question its uniqueness.

Edit: Here's the simplest example I can think of: Let's restrict the motion of a free particle ##L = \frac{1}{2}m(\dot x^2+\dot y^2)## to a circle ##x^2+y^2=1##. We choose coordinates ##x=\cos(\varphi)## and ##y=\sin(\varphi)## on the circle and let ##\varphi=\varphi(t)##. Then the chain rule gives ##\dot x = -\sin(\varphi) \dot\varphi## and ##\dot y = \cos(\varphi) \dot\varphi##, so ##L## restricted to the circle is just ##L = \frac{1}{2}m(\sin^2(\varphi)\dot\varphi^2+\cos^2(\varphi)\dot\varphi^2) = \frac{1}{2}m\dot\varphi^2##. The only thing we had to do was to apply the chain rule. We didn't need to worry about anything else.

Last edited:
It is possible that OP and I don't talk about exactly the same problem, but they are certainly related. The independence of ##q## and ##\dot q## is the clue.

Here's the simplest example I can think of: Let's restrict the motion of a free particle ##L = \frac{1}{2}m(\dot x^2+\dot y^2)## to a circle ##x^2+y^2=1##. We choose coordinates ##x=\cos(\varphi)## and ##y=\sin(\varphi)## on the circle and let ##\varphi=\varphi(t)##. Then the chain rule gives ##\dot x = -\sin(\varphi) \dot\varphi## and ##\dot y = \cos(\varphi) \dot\varphi##, so ##L## restricted to the circle is just ##L = \frac{1}{2}m(\sin^2(\varphi)\dot\varphi^2+\cos^2(\varphi)\dot\varphi^2) = \frac{1}{2}m\dot\varphi^2##. The only thing we had to do was to apply the chain rule. We didn't need to worry about anything else.
Very good, but suppose now that there are functions ##f(u,v)\neq -\cos(u)\,v## and/or ##g(u,v)\neq\sin(u)\,v## such that ##\dot x=f(\varphi,\dot\varphi)## and ##\dot y=g(\varphi,\dot\varphi)## for all paths ##\varphi(t)##. If we plug in these in the expression for ##L## and calculate ##\partial L/\partial\varphi## and ##\partial L/\partial \dot \varphi##, then the result could be different from the result using your formula. How can we then know which one is right?
Therefore, we need to prove that the functions ##f## and ##g## are uniquely determined by the requirements that ##\dot x=f(\varphi,\dot\varphi)## and ##\dot y=g(\varphi,\dot\varphi)## for all paths ##\varphi(t)##.

I think your proof using tangent bundles are actually based upon the same idea as my own proof, but I am not sure.

It is possible that OP and I don't talk about exactly the same problem, but they are certainly related.
I don't think so. The OP is basically asking: "Given ##L=\frac{1}{2}m\dot\varphi^2##, how do I find the Euler-Lagrange equations?" But you are asking: "Why ##L=\frac{1}{2}m\dot\varphi^2## and not something different?"

Very good, but suppose now that there are functions ##f(u,v)\neq -\cos(u)\,v## and/or ##g(u,v)\neq\sin(u)\,v## such that ##\dot x=f(\varphi,\dot\varphi)## and ##\dot y=g(\varphi,\dot\varphi)## for all paths ##\varphi(t)##.
Well, there might be such functions, but they wouldn't be relevant for the problem, since they can't have been derived using the chain rule. Thus they don't describe the same physics anymore, i.e. they don't describe a free particle constrained to a circle in our example. There is a clearly stated rule that says "use the chain rule" or (if you want to be more geometric) "concatenate the Lagrangian with the push-forward of the embedding".

If we plug in these in the expression for ##L## and calculate ##\partial L/\partial\varphi## and ##\partial L/\partial \dot \varphi##, then the result could be different from the result using your formula. How can we then know which one is right?
The one that has been derived using the chain rule is right. Substituting some arbitrary functions, just because some of their properties agree with some of the properties of the original functions isn't valid in any part of mathematics (not just in Lagrangian mechanics). Of course the result would be different if you don't follow the rules correctly.

--
Edit: You might ask: "Why the chain rule?" The answer is that the embedding ##\varphi:M\rightarrow\mathbb R^{N}## automatically induces the map ##\varphi_*:TM\rightarrow\mathbb R^{2N}## and this can be shown to give you the chain rule if you choose a coordinate system.

Last edited:
Yes, this is the idea. However, I'm not happy with phrases like "consider as independent" and "treat as constants". This sounds like one could arbitrarily choose how to interpret the derivatives. That's not the case, though. You can prove the Euler-Lagrange equations with full rigour by strictly applying the rules of calculus. There is no freedom how to interpret terms.

Here is how I would derive the Euler-Lagrange equations (leaving out all technicalities for the sake of simplicity):
We want to find the trajectory ##q(t)## that makes the action
$$S[q] = \int_{t_a}^{t_b} L(q(t),\dot q(t))\mathrm d t$$
stationary. A necessary condition for this to be true is that whenever we add a multiple of some arbitrary function ##\eta(t)## with ##\eta(t_a) = \eta(t_b) = 0## to ##q(t)##, ##S[q+\epsilon\eta]## shouldn't change much for small ##\epsilon##. Since, given fixed ##q## and ##\eta##, ##S[q+\epsilon\eta]## is just a real-valued function of the real parameter ##\epsilon##, we can state this as
$$\frac{\mathrm d}{\mathrm d \epsilon}\bigg|_{\epsilon=0} S[q+\epsilon\eta] =0 \text{ .}$$
Now we can insert the definition of ##S[q]## and (assuming everything behaves nicely) move the derivative under the integral:
$$\frac{\mathrm d}{\mathrm d \epsilon}\bigg|_{\epsilon=0} S[q+\epsilon\eta] = \int_{t_a}^{t_b} \frac{\partial}{\partial\epsilon}\bigg|_{\epsilon=0} L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))\mathrm d t$$
Note that the derivative acts on a function of the form ##f(g(\epsilon,t),h(\epsilon,t))##, so we can just apply the chain rule:
$$\frac{\partial}{\partial\epsilon}\bigg|_{\epsilon=0} L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t)) \\= \left[\frac{\partial L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))}{\partial (q(t)+\epsilon\eta(t))}\frac{\partial (q(t)+\epsilon\eta(t))}{\partial\epsilon}+\frac{\partial L(q(t)+\epsilon\eta(t),\dot q(t)+\epsilon\dot\eta(t))}{\partial (\dot q(t)+\epsilon\dot\eta(t))}\frac{\partial (\dot q(t)+\epsilon\dot\eta(t))}{\partial\epsilon}\right]\bigg|_{\epsilon=0}\\=\frac{\partial L(q(t),\dot q(t))}{\partial q(t)}\eta(t)+\frac{\partial L(q(t),\dot q(t))}{\partial \dot q(t)}\dot\eta(t)$$
Now you just need to put this back into the integral, use the standard integration by parts trick, make sure the boundary term vanishes and derive the Euler-Lagrange equations, using the fact that it should hold for all ##\eta##. I have abused notation a little bit, but I hope it is clear how this is to be understood.

Thanks a lot. My only question now is: How can we prove that we can apply the chain rule over the $L(q,\dot{q},t)$ the same way we would if $q$ and $\dot{q}$ were not related to each other at all? Furthermore, if you find any demonstration of the general chain rule, it would also be very nice. Thanks again.

Thanks a lot. My only question now is: How can we prove that we can apply the chain rule over the $L(q,\dot{q},t)$ the same way we would if $q$ and $\dot{q}$ were not related to each other at all? Furthermore, if you find any demonstration of the general chain rule, it would also be very nice. Thanks again.

What I have done is:
$$\frac{\mathrm d}{\mathrm d\epsilon} f(g(\epsilon,t),h(\epsilon,t)) = \frac{\partial f (x_1,x_2)}{\partial x_1}\bigg|_{x_1=g(\epsilon,t),x_2=h(\epsilon,t)}\frac{\partial g(\epsilon,t)}{\partial\epsilon}+\frac{\partial f (x_1,x_2)}{\partial x_2}\bigg|_{x_1=g(\epsilon,t),x_2=h(\epsilon,t)}\frac{\partial h(\epsilon,t)}{\partial\epsilon}$$
With the particular choice ##f(x_1,x_2) = L(x_1,x_2)##, ##g(\epsilon,t) = q(t)+\epsilon\eta(t)## and ##h(\epsilon,t)=\dot q(t)+\epsilon\dot\eta(t)##. ##q(t)## and ##\dot q(t)## are just functions of ##t##. It doesn't matter whether they are related or not, because we are doing partial derivatives with respect to ##\epsilon## only, so the ##t##-dependence or any other dependence doesn't play a role. Here's a simple example: ##f(x,t)=x g(t)+x^2##. Then ##\frac{\partial f}{\partial x} = g(t)+2x##. It doesn't matter what ##g(t)## is as long as we know that it doesn't depend on ##x##. If you want to review the chain rule, you might want to have a look at http://en.wikipedia.org/wiki/Chain_rule

1 person
What I have done is:
$$\frac{\mathrm d}{\mathrm d\epsilon} f(g(\epsilon,t),h(\epsilon,t)) = \frac{\partial f (x_1,x_2)}{\partial x_1}\bigg|_{x_1=g(\epsilon,t),x_2=h(\epsilon,t)}\frac{\partial g(\epsilon,t)}{\partial\epsilon}+\frac{\partial f (x_1,x_2)}{\partial x_2}\bigg|_{x_1=g(\epsilon,t),x_2=h(\epsilon,t)}\frac{\partial h(\epsilon,t)}{\partial\epsilon}$$
With the particular choice ##f(x_1,x_2) = L(x_1,x_2)##, ##g(\epsilon,t) = q(t)+\epsilon\eta(t)## and ##h(\epsilon,t)=\dot q(t)+\epsilon\dot\eta(t)##. ##q(t)## and ##\dot q(t)## are just functions of ##t##. It doesn't matter whether they are related or not, because we are doing partial derivatives with respect to ##\epsilon## only, so the ##t##-dependence or any other dependence doesn't play a role. Here's a simple example: ##f(x,t)=x g(t)+x^2##. Then ##\frac{\partial f}{\partial x} = g(t)+2x##. It doesn't matter what ##g(t)## is as long as we know that it doesn't depend on ##x##. If you want to review the chain rule, you might want to have a look at http://en.wikipedia.org/wiki/Chain_rule

Think I got it now. $\dot{q}$ is not a function of $q$, it's only a function of the variable $t$. Being so, if we partially derivate $L(q,\dot{q},t)$ w.r.t to $\epsilon$ we know that the partial derivatives w.r.t. to $q$(i.e.$\frac{\partial L}{\partial q}$) will "see $\dot{q}$ as constants" and vice versa, just as the chain rule predicts. Right? THANKS A LOT FOR ALL YOUR HELP, GUYS

Indeed :)

1 person
Think I got it now. $\dot{q}$ is not a function of $q$, it's only a function of the variable $t$. Being so, if we partially derivate $L(q,\dot{q},t)$ w.r.t to $\epsilon$ we know that the partial derivatives w.r.t. to $q$(i.e.$\frac{\partial L}{\partial q}$) will "see $\dot{q}$ as constants" and vice versa, just as the chain rule predicts. Right?
Yes, I think you got it. One last thing should be noted though: For the derivative with respect to ##\epsilon##, it doesn't matter whether ##\dot q## depends on ##t## or ##q##, as long as it doesn't depend on ##\epsilon##.

THANKS A LOT FOR ALL YOUR HELP, GUYS
No problem.

Well, there might be such functions
On the contrary: there are no such functions. There is only one possible pair of functions which works for all paths: the one given by the chain rule. I would guess that this uniqueness is a direct consequence of your tangent bundle approach (but I need to study this field more, my knowledge about these issues are quite shallow at the present), and anyway, I have given a more elementary proof.

The one that has been derived using the chain rule is right.
It is in fact the only one. But my point is that this uniqueness is not obvious, but has to be proved.
For suppose that we don't know about this uniqueness. Suppose also that we derive the formulas ##\dot x=-\cos(\varphi)\dot\varphi## and ##\dot y=\sin(\varphi)\dot\varphi## without using the chain rule, for example by a geometric argument (like: "The norm of the velocity vector is ##|\dot\varphi|##, since the particle moves along the unit circle in the ##xy##-plane with angular velocity ##\dot \varphi##, and its direction is obtained from the direction of the position vector ##(x,y)=(\cos \varphi,\sin\varphi)## by rotating it 90 degrees counterclockwise if ##\varphi>0##, clockwise if ##\varphi <0##. It follows that ##\dot x=f(\varphi,\dot\varphi)=-\cos(\varphi)\dot\varphi## and ##\dot y=g(\varphi,\dot\varphi)=\sin(\varphi)\dot\varphi##.) We then plug this in the expression for L and procced by differentiating wrt ##\varphi## and ##\dot\varphi##, thereby using the partial derivatives of ##f## and ##g##.
But then, we suddenly hesitate and say: "Are these functions ##f(u,v)=-\cos (u)\,v## and ##g(u,v)=\sin (u)\,v## really the same functions as those we obtain from the chain rule? If not, we will get wrong result."
Of course, in this simple case it is easy to see that we will get the same functions with the chain rule, but one can imagine more complicated cases with complicated coordinate transformation formulas, where geometric or other alternative methods exist to obtain these functions, which would be considerably simpler than using the chain rule. If we then don't know that these functions are unique, we would have to check with the chain rule to see if we obatined the right ones. If we know that they are unique, this is not necessary.

In applications, we often see such alternative derivations. Wouldn't it be nice to not have to check with the chain rule every time? If we prove the uniqueness, we know that this can be avoided.

Last edited:
On the contrary: there are no such functions. There is only one possible pair of functions which work for all paths: the one given by the chain rule.
Maybe this is a true statement. But even if it weren't true, this wouldn't be a problem at all. It's just not relevant for Lagrangian mechanics.

I would guess that this uniqueness is a direct consequence of your tangent bundle approach (but I need to study this field more, my knowledge about these issues are quite shallow at the present), and anyway, I have given a more elementary proof.
I didn't prove any uniqueness properties using the tangent bundles. I just explained the general formalism.

It is in fact the only one. But my point is that this uniqueness is not obvious, but has to be proved.
I agree that the uniqueness of these functions isn't obvious, but I don't agree that it has to be proved in order to make the formalism work. I'm not arguing that it is wrong (in fact I don't know). I'm just arguing that no such theorem is required. Maybe it is useful in some other field of mathematics.

For suppose that we don't know about this uniquess. Suppose also that we derive the formulas ##\dot x=-\cos(\varphi)\dot\varphi## and ##\dot y=\sin(\varphi)\dot\varphi## without using the chain rule, for example by a geometric argument (like: "The norm of the velocity vector is ##|\dot\varphi|##, since the particle moves along the unit circle in the ##xy##-plane with angular velocity ##\dot \varphi##, and its direction is obtained from the direction of the position vector ##(x,y)=(\cos \varphi,\sin\varphi)## by rotating it 90 degrees counterclockwise if ##\varphi>0##, clockwise if ##\varphi <0##. It follows that ##\dot x=f(\varphi,\dot\varphi)=-\cos(\varphi)\dot\varphi## and ##\dot y=g(\varphi,\dot\varphi)=\sin(\varphi)\dot\varphi##.) We then plug this in the expression for L and procced by differentiating wrt ##\varphi## and ##\dot\varphi##, thereby using the partial derivatives of ##f## and ##g##.
But then, we suddenly hesitate and say: "Are these functions ##f(u,v)=-\cos (u)\,v## and ##g(u,v)=\sin (u)\,v## really the same functions as those we obtain from the chain rule? If not, we will get wrong result."
If your reasoning was mathematically sound, you would automatically get the same result as if you had applied the chain rule. Otherwise, mathematics would be fundamentally flawed, since there can't be contradictory answers to the same problem. You don't need any uniqueness proof for this to work. On the other hand, if you were using handwaving arguments, a uniqueness theorem wouldn't help either.

In applications, we often see such alternative derivations. Wouldn't it be nice to not have to check with the chain rule every time? If we prove the uniqueness, we know that this can be avoided.
As I said, it's not the uniqueness of these functions that makes other derivations possible. It's the consistency of mathematics that makes them work.

If your reasoning was mathematically sound, you would automatically get the same result as if you had applied the chain rule.
I don't think you can prove this without using uniqueness.

I don't think you can prove this without using uniqueness.
Having established that the chain rule gives the correct answer, everything that doesn't give the same answer as the chain rule gives a wrong answer (or it gives an equally valid answer, which would be fine as well). Assuming mathematics is non-contradictory, you can't derive a wrong answer from true statements, so you can't derive an expression for ##\dot x## that yields wrong Euler-Lagrange equations.

--
By the way, having thought about this for a minute, I'm not sure what the statement is that you want to prove. Is it ##\forall t (f(q(t),\dot q(t))=f'(q(t),\dot q(t))) \Rightarrow f = f'##? If it holds for all ##t##, it also holds for ##t=0##, so if ##q(0)## and ##\dot q(0)## can be arbitrary and cover the whole domain of ##f## and ##f'##, this is trivially true and if they don't cover the whole domain of ##f## or ##f'## (e.g. ##q## is an angle between ##0## and ##2\pi##), it's easy to find counterexamples by using piecewise defined functions for example.

By the way, having thought about this for a minute, I'm not sure what the statement is that you want to prove. Is it ##\forall t (f(q(t),\dot q(t))=f'(q(t),\dot q(t))) \Rightarrow f = f'##? If it holds for all ##t##, it also holds for ##t=0##, so if ##q(0)## and ##\dot q(0)## can be arbitrary and cover the whole domain of ##f## and ##f'##, this is trivially true and if they don't cover the whole domain of ##f## or ##f'## (e.g. ##q## is an angle between ##0## and ##2\pi##), it's easy to find counterexamples by using piecewise defined functions for example.
I meant ##\forall q\forall t (f(q(t),\dot q(t))=f'(q(t),\dot q(t))) \Rightarrow f = f'##, and my proof of this is essentially the same as your above (for every point in the domain of ##f## and ##f'## and every allowed ##t##, there is a path such that ##(q(t),\dot q(t))## equals this point, but you are right that it is sufficient to consider ##t=0##).
I don't agree that this is trivial, but it is easy when you think of it. All I want is that textbook authors point this out.