Well, the unfortunate thing, pedagogically, is that in teaching about eigenfunctions and eigenvalues, the most obvious operators to use for examples are the position operator and the momentum operator. But those operators don't correspond to normalizable eigenfunctions. So you're left without any examples to motivate the idea of an eigenvalue and eigenfunction.

I suppose you can use matrices to motivate the ideas. But for wave functions, it's hard to see what's a simple example that doesn't have some problem or other. Maybe the harmonic oscillator wave function being an eigenfunction of energy?

There are actually a few ways to get around the difficulty of not normalizing like ## e^{ikx} ## function. One way is to consider a particle in a finite region of space (particle in a box) and to tend to infinity the width of the box. The other way is to not be considered a pure quantum state, but to consider it, more realistically with a wave packet. The difficulty comes from the fact that for a function like ##e^{ikx}## the particle is not confined to region of space, so that the probability of finding it everywhere is zero.

The probability of finding it everywhere is one - quirky distribution theory stuff. You must be careful about taking limits. Your box tending to infinity is analogous.

Again: The function ##u_k(x)=\exp(\mathrm{i} k x)## does not represent a quantum state! It's not square integrable! It is a generalized eigenstate of momentum and belongs not to the Hilbert space ##\mathrm{L}^2(\mathbb{R},\mathbb{C})## but to the dual of the domain of the self-adjoint operators for position and momentum. It's very important to emphasize this in order not to be confused as a beginner in learning QT!

The Fourier transform of the generalized position eigenstate is
$$F(k)=\int_{\mathbb{R}} \mathrm{d} x \delta(x-x_0) \exp(-\mathrm{i} k x) \delta(x-x_0)=\exp(-\mathrm{i} k x_0).$$
I used the usual convention for the Fourier transform as in high-energy physics concerning the sign and the factors of ##2 \pi##, i.e., with a minus sign in the exponential for the integral over position to get the wave function in momentum representation and no factors of ##2 \pi## in this integral:
$$\tilde{\psi}(k)=\int_{\mathbb{R}} \mathrm{d} x \psi(x) \exp(-\mathrm{i} k x).$$
Then the inverse transformation then reads
$$\psi(x)=\int_{\mathbb{R}} \frac{\mathrm{d} k}{2 \pi} \tilde{\psi}(k) \exp(+\mathrm{i} k x).$$

Well, this is a pedagogical question, whether it is important. The way that a lot of science teaching works is that you tell students something that is oversimplified, at first, and then later get into the reasons why it was an oversimplification, and how it can be fixed. It's a matter of opinion whether this is doing a disservice to the student.

The treatments that I've seen start off talking about plane waves as momentum eigenstates (with a remark that they aren't square-integrable, but that there are ways of dealing with this).

Yes, and the pedagogical answer is that it indeed is important. Myself I had quite some trouble to understand the concept of "generalized eigenstates", because nobody told us how to understand them right. Of course, too much mathematical rigor is also not helpful, but sometimes some of it helps!

But you understand it now. The question is: Did it harm you for you to have to learn the more sophisticated approach on your own, as opposed to being taught it from the start?

Well, it didn't do me harm, but sometimes it doesn't do harm to either tell things correctly rather than be silent about the mathematical problems involved. I don't say that you should expose students immediately with higher functional analysis but one should give the usual plausibility arguments, how to understand distributions. The ##\delta## distribution is a good example. Among the best textbooks from a pedagogical point of view is

M. J. Lighthill, Introduction to Fourier analysis and generalised function, Cambridge University Press 1959

The way that most people learn about the more sophisticated approaches, though, is to first learn a naive approach, and then go back and learn the more sophisticated approach after learning about the problems in the naive approach. Starting off with the more rigorous approach without first trying the naive approach and learning its limitations would seem like unmotivated rigor for rigor's sake.

It didn't do any harm to me at all to learn what is in the paper I posted. I wish I had come across it before learning it by myself. I had to go through many tomes down the ANU library near where I lived at the time. It took me a long time because what was required was not in one place - but in tomes like Gelfand and Shilov and other mathematical references. It was a long sojourn sorting it out before returning to QM proper.

That's one reason I like Ballentine. It gives a good overview of whats going on without delving into deep waters like nuclear Spaces.

I read the popularization - Schrodinger's Cat back in about 1983. It mentioned THE books were Dirac's - Principles of QM and Von-Neumann - Mathematical Foundations of QM. I devoured both but recognized immediately, just like Von-Neumann did, and was scathing about in his introduction, Dirac was a crock - supremely elegant - but a crock. Von-Neumann was easy because I already knew Hilbert spaces from my degree. But there had to be a reason Dirac worked. It was a long sojourn sorting it out.

I would rather have studied Balllentine first, then Distribution theory, then Rigged Hilbert Spaces. I even knew something was cuckoo with the Dirac Delta function from my study of differential equations, so should have looked into that first.