More on Distributing High Quality Audio

bhobba · Apr 28, 2025

A while ago I did some posts on the modern way audio is recorded, mastered and distributed.

I have been investigating this further and am writing this post on what I found.

These days, high-quality recordings are often recorded in one-bit DSD, which you can look into (a link I provide later has details). However, DSD is hard to use when creating masters. So, a format where Audio Engineers have an overkill amount of leeway in creating masters, called DXD, was devised (352.8/24, ie 352.8 kHz sampling at 24 bits). Some high-quality producers release their recordings in DXD. I have one, and it sounds glorious. However, have a look at:

What About DXD? Surprise!

CD quality 44.1/16 is good enough if 16 bits are used. But certainly not the full DXD; it is all noise above 50 kHz. Knowing this, some DAC manufacturers have a 50 kHz filter in their DACS.

However, is 16 bits enough? To answer that, we need to look into dithering:

24/192 Music Downloads

So, for distribution, 16 bits are more than good enough. 88.2/16 is likely all that is ever needed; even audiophile nuts do not need 24 bits. Most of the time, 44.1 sampling is enough.

There is a sneaky way to process the DXD file so that only the audio, not the noise, is distributed. It is called lossyWAV:

lossyWAV - Hydrogenaudio Knowledgebase

It is a form of adaptive dither that allows FLAC compression to operate much more effectively.

There is the issue with the ultrasonic noise in DXD being larger than 16 bits, so it is not removed when truncated to 16 bits. The methods of the following article can fix this, as well as explain DSD:

Fundamental Principles Behind the Sigma-Delta ADC Topology: Part 1

It is then easy for a program to determine the minimum sampling rate necessary to prevent aliasing (distortion that occurs if there is any content above half the sampling frequency) and decimate (that is, just throw away unnecessary samples and still have a sampling rate above half the maximum frequency) the recording to that minimum. It will usually be just 44.1 sampling, but a higher sampling rate may occasionally be required.

This can be done by upsampling to 10xDSD. A little math shows it is an exact multiple of all the common sampling frequencies. This decreases noise, plus decimation is trivial.

Also, using lossyWAV and FLAC, files that are close in size to lossy audio files are all that is needed for very transparent audio.

FactChecker · Apr 28, 2025

bhobba said:

I have one, and it sounds glorious.

It is interesting that you can hear the difference. That tells me there is something lacking in the usual DVD format. I have always been skeptical of the claim that the DVD format is completely satisfactory. (although it is fine for my use and hearing ability)
ADDED: I should have said CD format. I don't know it the DVD audio is the same as CD.

bhobba · Apr 28, 2025

FactChecker said:

It is interesting that you can hear the difference. That tells me there is something lacking in the usual DVD format. I have always been skeptical of the claim that the DVD format is completely satisfactory. (although it is fine for my use and hearing ability)

I think you are correct.

The conjectured reason (from the article DXD - Surprise) is 'Maybe the reason is because the filtering needed at lower sample rates becomes unnecessary when you get to 352.8 or 384 kHz'

Some filtering must be used to create the DXD file from the DSD file, but it can be so gentle and is at such a high frequency that it is like no filter at all:
https://media.ifi-audio.com/wp-content/uploads/2020/02/iFi-audio-Tech-Note-The-GTO-Filter.pdf

Such a filter at 352.8, while still a filter, for all practical purposes, does nothing in the audible range - it is just leaky and lets ultrasonic noise through (see the attached file for the 192k GTO filter response). It would be even better at doing nothing at 352.8

The suggested method requires just 'nothing' filtering to have lower sample rates, simply upsampling to get the ultrasonic noise below 16 bits and decimation.

The real issues with filters come at the DAC end. While decimation is simple, the upsampling filter to restore it to DXD sampling is more difficult. The upsampling filter in a product like HQ Player is one possibility.

As an aside, ideas like this are being looked into by companies like MQA (it is not MQA - just being researched by the new owners of MQA):

https://mqalabs.com/wp-content/uploads/2024/12/MQA-Labs-QRONO-White-Paper_updated.pdf

Thanks
Bill

bdrobin519 · Apr 29, 2025

bhobba said:

A while ago I did some posts on the modern way audio is recorded, mastered and distributed.

I have been investigating this further and am writing this post on what I found.

These days, high-quality recordings are often recorded in one-bit DSD, which you can look into (a link I provide later has details). However, DSD is hard to use when creating masters. So, a format where Audio Engineers have an overkill amount of leeway in creating masters, called DXD, was devised (352.8/24, ie 352.8 kHz sampling at 24 bits). Some high-quality producers release their recordings in DXD. I have one, and it sounds glorious. However, have a look at:

What About DXD? Surprise!

CD quality 44.1/16 is good enough if 16 bits are used. But certainly not the full DXD; it is all noise above 50 kHz. Knowing this, some DAC manufacturers have a 50 kHz filter in their DACS.

However, is 16 bits enough? To answer that, we need to look into dithering:

24/192 Music Downloads

So, for distribution, 16 bits are more than good enough. 88.2/16 is likely all that is ever needed; even audiophile nuts do not need 24 bits. Most of the time, 44.1 sampling is enough.

There is a sneaky way to process the DXD file so that only the audio, not the noise, is distributed. It is called lossyWAV:

lossyWAV - Hydrogenaudio Knowledgebase

It is a form of adaptive dither that allows FLAC compression to operate much more effectively.

There is the issue with the ultrasonic noise in DXD being larger than 16 bits, so it is not removed when truncated to 16 bits. The methods of the following article can fix this, as well as explain DSD:

Fundamental Principles Behind the Sigma-Delta ADC Topology: Part 1

It is then easy for a program to determine the minimum sampling rate necessary to prevent aliasing (distortion that occurs if there is any content above half the sampling frequency) and decimate (that is, just throw away unnecessary samples and still have a sampling rate above half the maximum frequency) the recording to that minimum. It will usually be just 44.1 sampling, but a higher sampling rate may occasionally be required.

This can be done by upsampling to 10xDSD. A little math shows it is an exact multiple of all the common sampling frequencies. This decreases noise, plus decimation is trivial.

Also, using lossyWAV and FLAC, files that are close in size to lossy audio files are all that is needed for very transparent audio.

As evidence of the highest volumetric recorded tracks/album created to my knowledge was the Death Magnetic - Metallica album. To my knowledge it was a record breaking decibel read recording.

bhobba · Apr 29, 2025

Now here is something interesting. I used FLAC for compression after decimation from filtered DXD. Microsoft has, however, devised a compression method in the frequency domain, rather than transmitting the difference from a predictor like FLAC does:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Malvar_DCC07.pdf

It has better compression performance than FLAC.

64 DSD is 2.8 mHZ, 128 DSD 5.6 mHz and so on. These days, the latest is 1024 DSD, which is 45 mHz, and it is likely to go even higher in the future. The output is one-bit noise-shaped audio that, if passed through a filter like the GTO, easily reaches DXD accuracy with noise below 16 bits for 256 DXD and above.

Chop it off at 16 bits, and there are rarely any frequencies above the usual 22 kHz. Convert to lossyWAV and compress using Microsoft compression. It could be decoded at the DAC, but since it is in the frequency domain, padding out extra zero frequencies makes it easy to convert to some very high-frequency PCM. Modern DAC chips (or FPGA's) easily convert it into one-bit DSD by noise shaping and upsampling. DAC's, like the Direct Stream DAC, feed the noise-shaped DSD into a simple high-quality audio transformer. Chips are available from companies like ESS that do the conversion for the DAC designer who does not want to code their own FPGA chip. I have the Chord TT2 DAC that the designer, Rob Watts, uses FPGA chips (probably some output high-speed switching transistors as well), and managed to coax 18w out of it to connect to speakers directly. A digital system from end to end.

There are interesting times ahead in audio.

Thanks
Bill

More on Distributing High Quality Audio

Attachments

Thread 'How far will we let AI control us?'

Similar threads

On Progress Toward AGI

How to disable AI responses in Google Searches?

How far will we let AI control us?

What Free Privacy-Focused AI Chatbots Don’t Use My Data for Training?

If you think having a backup is too expensive, try not having one

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers