It seems simpler than all this to me (which might be my error, I'm a simple-minded person!). Amplitude modulation of a signal produces sum and difference frequencies (stop me here if this is not correct, but I'm sure that it is).
Singing is alternate compressions and rarefactions of the air. Wouldn't we expect that to amplitude modulate any musical instrument that is driven by our air supply? If I blow softer or harder (within limitations) on a wind instrument, the sound is louder/softer - isn't that amplitude modulation?
So if I play a 400Hz note on a wind instrument, and sing at 600 Hz, I would expect there to be a 200Hz difference signal, the original carrier of 400 Hz, and a sum of 1000Hz.
OP said he did not observe this difference frequency on a flute? Perhaps the flute is not so sensitive to the variations in air pressure that singing produces (and it's rather tough to sing loudly while maintaining the embouchure required to sound a note on a flute! - But we hopefully have some fans of the Jethro Tull band and Ian Anderson.). So not observing may not equate with "not existing"? And perhaps what I assume is the far greater non-linearity in a reed instrument, that the effect is far more pronounced with a reed? I no longer have a clarinet at my disposal, so I refer here:
https://www.britannica.com/art/reed-instrument
I believe this one-sided constricted motion of the reed is why
@Baluncore earlier made the comparison of the reed to a diode.
So, isn't that all that is needed? I don't understand, from the definition of amplitude modulation, why any sort of 'storage' is required?