I am watching Susskind's derivation of Newton's F=ma from the Euler-Lagrange equations (53 minutes in) here for which he uses the Lagrangian of kinetic energy minus potential energy. I have seen this done elsewhere as well. As far as I can tell, and please correct me if I'm wrong, the only reason to do this is because by convention we define F as the negative of the derivative of potential so subtracting potential energy in the Lagrangian will lead directly to F=ma and not -F=ma. I guess my real question is, is there any reason why we can't just change the convention of having F be equal to the negative derivative of potential energy so we would be able to write the Lagrangian as kinetic plus potential energy (as it feels like it should be)?

Conventions are arbitrary by definition. The only reason to adopt one is to avoid confusion of yourself or others. I do like to think of a change in energy ΔE=E2-E1 so a convention that allows that seems natural.

The thing is, we always say that energy is conserved. However, total energy is supposed to be kinetic plus potential. If the Lagrangian is kinetic minus potential then we're saying that that's conserved, not energy. Am I missing something?

If the lagrangian does not depend on time explicitly then the quantity that is conserved is the Hamiltonian. In many cases of interest, the Hamiltonian is infact the total energy.

you cannot.The original derivation of lagrange eqn. is based on the concept of virtual work which nevertheless uses newton's law to get the lagrangian eqn. of motion and lagrangian come out to be as [kinetic- potential].

The force is defined with the minus because then ## \Delta T = \int F dr = \int (-dU/dr) dr = - \Delta U ##, and then ## \Delta T + \Delta U = 0 ##, which is conservation of energy.

And with that convention, the Lagrangian has to be T - U to match Newton's equations.

That's really an insufficient explanation when one can derive lagranges equations of motion and from there find that the lagrangian often takes the form T-V and if not, that still offers no explanation for why one would choose such a definition.

I haven't watched the video, but the minus comes from Hamilton's Principle which kinda says "Of all the possible paths in a time frame the path that is taken is the one that minimizes the time integral of the DIFFERENCE between potential and kinetic energy". I could throw in more fancy terms and make this more precise, but thats what it generally states. Mathamatically it looks like: [tex]{\delta \int_{t_i}^{t_f} (T-U) dt=0}[/tex].

We expand this concept along with general coordinates to get the lagragian. Since you're watching classical mechanic videos, i don;t think i need to go any further!

" path that is taken is the one that minimizes the time integral"
Incorrect.
Variational principles, such as the Hamiltonian, or for that matter Luke's variational principle in fluid mechanics yield the equations of motion as the condition for a STATIONARY "value" of the integral, not necessarily a minimizing value.

I don't know whether this is helpful, or not, but there's a way to see that the minus sign is actually connected with the minus sign appearing in the relativistic space-time metric.

Relativistically, we combine the energy [itex]E[/itex] and the momentum [itex]\vec{p}[/itex] into a 4-vector [itex]P[/itex]. We combine a spatial distance [itex]\delta \vec{x}[/itex] and a temporal separation [itex]\delta t[/itex] into a 4-vector [itex]\delta X[/itex]. The action for a particle with energy-momentum 4-vector [itex]P[/itex] traveling through a spacetime interval [itex]\delta X[/itex] is given by: [itex]P \cdot \delta X = \vec{p} \cdot \vec{\delta x} - E \delta t[/itex], which we can rewrite as [itex](\vec{p} \cdot \vec{v} - E) \delta t[/itex], where [itex]\vec{v}[/itex] is the velocity [itex]\frac{\delta \vec{x}}{\delta t}[/itex].

If the particle is moving slowly compared to the speed of light, then we can write, approximately:

[itex]\vec{p} = m \vec{v}[/itex]
[itex]\vec{E} = \frac{1}{2} m v^2 + U[/itex]

So the action becomes
[itex](\vec{p} \cdot \vec{v} - E) \delta t = (m v^2 - (\frac{1}{2} m v^2 + U)) \delta t = (\frac{1}{2} m v^2 - U) \delta t[/itex]

If you assume the force derived from the gradient of a potential U, then the Newton's equation can be recasted as the Euler-Lagrange equation where you define the Lagrangian as [tex] \mathcal{L} = T - U. [/tex]
The Euler-Lagrange equation is really similar to a solution of the principle of the least action people used in optics (see Maupertuis's work for instance), and that's why they defined the action in classical mechanics, and consequently the Hamiltonian via the canonical formalism defining the total energy as [tex] \mathcal{H} = T + U. [/tex]