OK, I think I now understand JesseM's definition of a "medium" as something having "a rest frame of its own". My old understanding was only partially technically correct, and completely wrong as to what motivated the definition. I'm summarizing my current understanding here. I also took into account comments from DaleSpam and rbj. Thank you all for commenting!
I'm going to use JesseM's idea that a medium has "a rest frame of its own", but with slightly different terminology:
an isotropic homogeneous medium defines a preferred reference frame in which the speed of a wave in any direction is the same.
If I have interpreted JesseM correctly, I think the uncontroversial part is:
1. I assume a Newtonian space in which Galilean relativity holds, and assign cartesian coordinates. Instead of a box of air, I am going to use a box of solid material, and consider low energy sound waves in the solid. If three solid boxes are put at different positions along the y-axis and pushed in the x direction with different velocities, each box defines a preferred reference frame, and we have three preferred reference frames at any time.
2. If we have only one box, and we declare the box to be the whole universe, then we have a single preferred reference frame.
OK, now I want to play with some definitions, and this will be a matter of taste, so I don't expect agreement:
3. In #2, since there is only one preferred reference frame, we can instead of saying that space is filled with a medium, just define the preferred reference frame to be a property of space itself. In this case, we don't need a "medium" for sound. Further justification for saying that it is the geometry of space is that even in this case, we don't have just one preferred reference frame, but a preferred class of reference frames related to each other by space translation and rotation.
4. If we apply the operational definition of a medium in #1 to light (now in Minkowski space), we get that we have a class of preferred reference frames, which are of course the inertial frames related by Lorentz transformations. In this case, "preferred class of reference frames" is standard nomenclature, but "equivalence class of rest frames" or "medium" is nonstandard. However, if one likes the idea that a photon has zero rest mass, then it doesn't seem so bad to say that the "preferred class of reference frames" are a "preferred class of rest frames".
5. Formally, there is a medium in situation 4, but by comparison with situation 3, we can get rid of the medium and just say that these are properties of space itself.
6. When Nobel Laureates like Wilczek or Laughlin say that quantum field theory is a descendent of the aether, they are referring to the aether or medium as being made of atoms, *not* as defining a preferred reference frame. This is because "stuff" like electrons and protons have all become waves, just like light. Furthermore, electrons, protons and photons are not fundamental, they are excitations of electron, proton and photon fields that permeate all space, and which have resting states (not frames) that are non-zero at each point in space. So the main difference between a "medium" and a "medium" is "a preferred class of reference frames" versus "stuff, like electrons and protons".
7. When a person says "sound requires a medium", I suggest he usually only has a vague idea of what he means. It could mean:
a) sound carries energy from one place to another, so it travels in something
b) sound travels in air, and air is made of atoms
c) sound defines a preferred reference frame (I'm sure this is usually the furthest from his mind).
The problem is that for sound, all three meanings are true and thus confused with each other!
8. For the corresponding definitions for light I would say:
a) This is called spacetime, and we already have a name for it, there is no need for a "medium" that is distinct from spacetime.
b) Light travels directly on spacetime in classical theory, but in quantum field theory, it is an excitation of a "medium" called the photon field, just as electrons are excitations of a "medium" called the electron field. We know we need the "medium", and not just its excitations, because of the Casimir effect.
c) Light defines a preferred class of reference frames, but rather than attributing this to an isotropic homogeneous medium, we attribute it to a symmetry of space, just as we would do for sound waves in a universe that was a solid box.
9. In an earlier post, I somehow felt that we only really need the idea of a medium when there are two media, and still do, but I don't quite know how it fits in the above points.