Fredrik said:
I have a follow-up question about that. How do mathematicians express and prove this identity?
\delta(x^2-a^2)=\frac{1}{|x|}\Big(\delta(x-|a|)+\delta(x+|a|)\Big)
Once you have
definitions, the way to proceed will probably be obvious.

The big problem is that if you haven't
defined what an expression like \delta(x^2-a^2) might mean, you certainly can't prove any theorems about it! The 'physicists derivation' you gave is a
motivation: we would like to be able to choose a definition so that that calculation works out. And if you don't need a general theory of derivatives and composition, then you could just take identities like that as a
definition, and then do a check to make sure they have the properties you want. (And take care not to use any properties you haven't checked)
Now, if you wanted a general theory of composition, it would probably go something like this. (disclaimer: having never seen it before, I'm working this out from scratch, so it may or may not bear resemblance to how people actually do things)
The final calculation is at the end, after a separator.
First, note that duality gives us, for any function f:V \to W, a dual function f^*:W^* \to V^*. In inner-product-like notation, this is defined by
\langle f^*(\omega), v \rangle := \langle \omega, f(v) \rangle
This has a similarly simple expression in functional notation, but it will cause some notational confusion with the theory of composition I want to derive, so I won't state it.
Suppose that V and W are spaces of functions on X and Y. If we have a good map f:X \to Y, we get another kind of dual mapping f^* : W \to V defined by composition: f^*(w)(x) = w(f(x)). And, of course, we get the dual dual mapping f^{**} : V^* \to W^*, which I'm going to rename as f_*.
f^* here is sometimes called a 'pullback', and f_* a 'pushforward'.
Suppose we have an inner product on a vector space V. For any v in V, the inner product let's us define the 'transpose' of v to be an element of V^* as follows, written in inner-product-like notation on the left, and actual inner-product notation on the right:
\langle v^T, w \rangle := \langle v, w \rangle. Henceforth, I will not explicitly write the transpose operation.
Note I've done nothing new or specific to this situation -- the above is just basic operations in the arithmetic of functions.
Now, we can already do some calculations! Let's fix a function f : \mathbb{R} \to \mathbb{R}, use the standard inner product, and let \phi be a test function. Then, we have f^*(\phi)(x) = \phi(f(x)). What about the pushforward map?
Well, let's suppose that
f is invertible and increasing, with inverse
g. For a test function \psi, we can make the following calculation:
\langle f_*(\psi), \phi \rangle = \langle \psi, f^*(\phi) \rangle<br />
= \int_{-\infty}^{+\infty} \psi(x) \phi(f(x)) \, dx<br />
= \int_{-\infty}^{+\infty} \psi(g(y)) \phi(y) g'(y) \, dy<br />
= \langle g' g^*(\psi), \phi \rangle
thus giving us f_*(\psi) = g' g^*(\psi). (Again, recall that I'm suppressing the transpose operation)
We can also vary the calculation slightly to arrive at another interesting result:
\langle \psi \circ f, \phi \rangle = \langle f^* (\psi), \phi \rangle<br />
= \langle f' f^* (\psi), \phi / f' \rangle<br />
= \langle g_* \psi, \phi / f' \rangle<br />
= \langle g_* (\psi) / f', \phi \rangle
and so I'm inspiried to make the following definitions
Definition: If
f is a good function, and \omega is a distribution, then define f \omega by \langle f \omega, \phi \rangle = \langle \omega, f \phi \rangle
Definition: If
f is a good, increasing function with inverse
g, then for any distribution \omega, define \omega \circ f = g_*(\omega) / f'
The above still works for multivariable test functions and distributions; the appoproriate condition on
f is that it's invertible with positive Jacobian. At this point, I'm going to assume we have also defined partial integration of multivariable distributions. (i.e. evaluate a 2-variable distribution at a 1-variable test function to produce a 1-variable distribution. This is extremely similar to tensor contraction) I will also assume we've worked out the properties of composition as defined above.
So now, my magic trick to define arbitrary composition distributions is to convert to the invertible case by adding a variable, by virtue of the fact that the following is invertible:
u = x
v = y + f(x)
with Jacobian 1.
Now consider this if \omega is a distribution, then it should also be a two-variable distribution by adding another dummy variable. Heuristically speaking, applying the above transformation would give
\iint \omega(v) \phi(u) \psi(v - f(u)) \, du \, dv<br />
=<br />
\iint \omega(y + f(x)) \phi(x) \psi(y) \, dx \, dy
Note that this
is well defined, becase we simply composed a two-parameter distribution with an invertible function! Now, if partial integration with respect to
x gives us an honest-to-goodness test function, then we have
\iint \omega(y + f(x)) \phi(x) \psi(y) \, dx \, dy<br />
= \int g(y) \psi(y) \, dy
And so we can make the following definition
Definition: Let \omega be a distribution, f a good function, \phi a test function. Suppose there is a good function
g such that, for every test function \psi, we have the identity \langle \omega(y), \phi(x) \psi(y - f(x)) \rangle = \langle g, \psi \rangle (where the first inner product is over both variables). Then we define \langle \omega \circ f, \phi \rangle = g(0).
--------------------------------------------------------------------------------------
Now, let's compute
\iint \delta(v) \phi(u) \psi(v - u^2 + a^2) \, du \, dv<br />
= \int \phi(u) \psi(a^2 - u^2) \, du<br />
= \int_{-\infty}^{a^2} \frac{\phi(\sqrt{a^2 - x}) + \phi(-\sqrt{a^2 - x})}{2 \sqrt{a^2 - x}} \psi(x) \, dx
And so we have (assuming the integrand is 'good'):
\langle \delta(x^2 - a^2), \phi(x) \rangle<br />
= \frac{\phi(|a|) + \phi(-|a|)}{2|a|}
when a > 0, and finally
\delta(x^2 - a^2) = \frac{1}{2|a|}\left( \delta(x - a) + \delta(x + a) \right)
Note that \delta(x^2 - a^2) is undefined for
a = 0. More interestingly, you can let a be a variable rather than a constant (or maybe a 'variable constant'), and now this expression is distributional in a.