Meridian’s pursuit of techniques to increase timing precision was inspired by recent findings in psychoacoustics and neuroscience. The temporal blur target of 10µs is based on this research. The AES paper mentioned earlier is full of citations to psychoacoustic and neuroscience literature. Timing is so important because our hearing mechanism is acutely tuned to impulsive sounds for instantly determining the “where, then what” of the object creating the sound. The distortion of these vital cues by temporal blur diminishes the sense of musical realism in a variety of ways. In the accompanying interview, Bob Stuart describes in more detail why temporal blur is so detrimental to sound quality.
However, improving the temporal response of the digital chain also places tighter requirements on the analog system. In a typical recording chain we have a microphone, amplifier, recording console, A/D, D/A, playback amplifier, and loudspeaker. To approach the target transparency of air, each step must be right and work towards this end. As Fig. 3 shows, temporal blur is cumulative; each successive stage can inadvertently spread transient energy over a wider and wider interval. It’s not uncommon for a signal to have been subjected to a cascade of eight filters by the time it’s gone from microphone to loudspeaker; the damage is significant, and yet each individual stage isn’t that bad in isolation.
Rethinking the Container
To understand how MQA can convey true high-resolution digital audio at a much lower bit rate, you must first realize that a large percentage of a conventional 192kHz/24-bit file is empty baggage that contains no useful information. A 192/24 file is like a microwave-oven-sized box storing an object the size of a paperback book.
One reason conventional PCM coding is so inefficient is that the sampling frequency is fixed, determined by the highest audio frequency we want to encode. The Nyquist Theorem says that the sample rate must be at least twice as high as the highest audio frequency of interest. A digital system’s fixed sampling frequency is chosen to accommodate the highest audio frequency we want to preserve—44.1kHz sampling to encode a 20kHz audio signal, for example. A fixed sampling frequency is applied to all audio signals, no matter their frequency. It’s illuminating to consider that a 20Hz audio signal is sampled about 2200 times per waveform, whereas a 20kHz audio signal is sampled slightly more than two times per waveform.
MQA addresses this disparity by losslessly (or virtually) dividing the audio into octave-wide sub-bands, conceptually coding each with a lower sampling rate than the ensemble. MQA is truly hierarchical, and although the example here is 192kHz, sample rates of 384, 768, or higher are accommodated. In fact, the mathematics includes infinite sampling (analog) since that is the real target.
Similarly, in conventional PCM, the quantization word length is typically fixed at 16 bits, 20 bits, or 24 bits. (The word is a binary number that represents the analog signal’s amplitude at the moment the sample is taken.) The longer the word, the more bits available to encode the amplitude information, and the greater the dynamic range that can be captured. However, there is a large inefficiency because the steps are linear and not logarithmic. MQA, by contrast, uses fractional bits to more accurately code the critical kernel.
The term “coding space” describes the entire range of frequencies and amplitudes that can be captured by the encoding scheme. Fixed sample rate and fixed word length give us a coding space that is rectangular. For example, in a 192kHz/24-bit system, we can encode audio frequencies up to 96kHz (half the sampling frequency) with a theoretical spectral dynamic range of 144dB. This coding space is “rectangular” because plotting the system’s frequency range along the horizontal axis and the dynamic range along the vertical axis results in a rectangular box and if the frequency axis is linear, area maps to data rate.
But the actual “information space” of sounds in nature and music isn’t rectangular; it’s triangular like that of Figs. 4 and 5. As frequency increases, amplitude decreases. Fig. 6 shows a triangle superimposed on the spectral content of an actual musical signal (the Ravel quartet). The bottom of the triangle defines the recording’s noise floor. Meridian analyzed the spectral content of thousands of recordings across all musical genres and confirmed similar spectral distributions, each one different, but of a class.
MQA encodes the information within the triangle as precisely as possible using advanced sampling kernels. Moreover, it effectively does this with different rates from the perspectives of the original and reconstructed signal compared to the transmission path.
Very high frequencies can be preserved with word lengths that reflect their narrow dynamic ranges, since above the point of the triangle there is no signal, just (inaudible) noise. In fact this region is a paradox: It contains no music, but if we remove it by filtering we would blur envelope information in the octave below. Audio information above 20kHz has very little amplitude—it consists of low-level upper harmonics but is critical to reproducing the timing that cues location. Consider that the fundamental frequency of the highest note on an 88-key piano is 4186Hz. An analog-to-digital converter would never encounter full-scale signals at 96kHz, yet the conventional 192/24 PCM system is designed to encode signals that it will never see.
In short, the combination of folding the sample rate linearly with an encoding kernel that reflects the signal greatly reduces the number of bits required to correctly capture the entire musical signal.
I must stress that this approach has nothing in common with lossy compression systems that throw away information inside the triangle deemed to be “inaudible” because those sounds are theoretically below the instantaneous human auditory masking threshold. Rather, MQA applies bits precisely where they are needed and doesn’t allocate data to potentially encode signals that will never exist. By comparison with MQA, conventional PCM is a crude and inefficient method of encoding audio. Well-meaning but uninformed skeptics, to whom large file size is the ultimate measure of resolution (and a source of comfort), may suggest that MQA ignores real musical information in its quest to reduce the bit rate. Not true: There is no musical information outside the triangle, and all the information within the triangle is preserved. Nothing is thrown away as with lossy compression systems. In fact, the signal heard in the studio is delivered and authenticated with lossless bit-for-bit precision to the listener.