Well, part of the problem you're having is probably that I haven't written something quite accurate. I knew something was bothering me, but i hadn't yet exactly pinned it down.
Basically, the point is that, using the tempered distribution definition, that \delta(x - y) is the operator that takes a pair of test functions
f and
g and produces the number
\delta(x - y) \star (f(x), g(y)) := \int_{-\infty}^{+\infty} f(x) g(x) \, dx
(Of course, \star is more usually written in integral notation)
In dirac notation, distributions can be written as bras or kets or other operators; for example, the operator
T defined by T(x) \star f(x) := f(0) would would be the bra \langle 0 \mid (assuming we were writing our functions as kets).
And so, I wonder to myself why we are writing \delta(x - y), rather than expressing this distribution as a ket or a bra or an operator of some sort... but alas I'm not seeing how to do the conversion. I think it's because both
x and
y are explicit variables, not hidden ones. Everything looks like it's fine, but I still feel somehow uncomfortable about it.