Yes. But just to clarify there are two things working here. The first, as you mentioned, is the hope that the phase shifts are small enough that they are still close to the original phase when transmitted. In this manner all we would do is map the received signal back to the closest valid point in phase space.
The second is that we can use a filter that "learns" the noise to further aid in recovering a signal. For example, let's say we have a situation where we always add a constant 45 degrees of phase shift to all signals. In this case, the above closest match strategy would always fail. But, if we had a filter that could recognize this shift then we could perfectly reconstruct the original signal. So, what we do is we have a filter that has a feedback loop to adjust itself. We send a known signal that is the training signal to get an idea of how the noise is affecting the signal and initialize our filter off of that (the filter would notice the constant phase shift and adjust itself to remove it). Then we start sending data but the filter is still using feedback to continually self-correct itself. For example, let's say the simple constant 45 degree phase shift moves to a constant 180 degree phase shift because you moved the receiver to a new location. The continuous feedback would allow the filter to note the transition and adjust for it on the fly.
So even if the noise starts to shift points to result in incorrect decoding, a good filter can help remove these problems. Only when the filter and encoding scheme can no longer keep up with the noise problems do we then drop down to a more robust scheme. But yes, you essentially have the jist of it.
EDIT: One thing to note about your frequency statement. There are two parts to a signal, the actual information content is modulated into a high frequency carrier signal. For example, given only four points in phase space we only need to have a small amount of bandwidth to do this. So the actual signal is a low frequency but we modulate this into the desired carrier frequency. The frequency that we choose to contain the information can be low or high, it doesn't matter because we are only concerned with the phase space representation. The carrier frequency is usually chosen by whatever bands we are allowed to operate at. However, higher frequencies allow us to compact more information by having additional channels. We may only need 1 MHz bandwidth to encode a single channel of information. So, if we have a carrier operating at 100 MHz, then we may be only able to have 10 channels from 95-105 MHz because we now are using 1/10th of the band at 100 MHz. But at 1 GHz, 1/10th of the band would allow us to have 100 channels. If each channel is our two bit scheme above, we can have 10 times the bandwidth by going from 100 MHz to 1 GHz. The additional channels allow us to service multiple users, like in Wi-fi. This is why if you have a lot of users with heavy traffic on a single access point you will have lower bandwidth since you are allocated fewer channels. So there are a lot of factors that can go into choosing the carrier frequency and such. An engineer that actually designs or works with the standards of such devices would know more. But I just wanted to note that a lower frequency is generally associated with a lower bandwidth/information content.