Let the Revolution Begin

MQA and the Overthrow of 20th Century Audio

Digital-to-analog converters,
Music servers and computer audio,
Let the Revolution Begin

“There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.” —Lord Kelvin in 1900, five years before Einstein’s paper on relativity.

The term “paradigm shift” was coined by Thomas S. Kuhn in his influential 1962 book The Structure of Scientific Revolutions. According to Kuhn, a paradigm shift in the physical sciences occurs when a body of evidence accumulates that suggests the principles on which a scientific discipline is founded (the paradigm) is wrong, and a new paradigm replaces the old (the shift). Kuhn shows that this process unfolds in identical fashion throughout history no matter what the discipline.

Every scientific revolution, from the Copernican model supplanting Ptolemy’s worldview, to relativity upending Newtonian physics, occurs in specific and defined phases. One of these phases is characterized by “crisis” in which a “battle” (Kuhn’s terms) breaks out between followers of the old and new paradigms. The conflict arises because discoveries made within the existing paradigm don’t quite fit that paradigm, yet the emerging paradigm is not yet accepted as scientific fact. Indeed, the emerging paradigm is often regarded as heresy.

Moreover, if you’ve spent your entire life adhering to a certain set of ideas, it’s difficult to accept that your beliefs have been based on erroneous assumptions. You are simply too invested in the old paradigm. This resistance to new ideas is so entrenched that Kuhn suggests that a revolution is complete only when all the adherents of the old paradigm have died. He quotes Max Planck: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

Andrew Quint’s Guest Editorial in this issue prompted me to revisit The Structure of Scientific Revolutions (I first read it in 1990), because Andrew’s description of the controversy over MQA mirrors Kuhn’s “crisis” phase of a scientific revolution. As Andrew describes, some commentators have staked out the position that PCM (or DSD) encoding is essentially perfect and therefore MQA is unnecessary at best and a fraud at worst. Unfortunately, the Internet has given voice to anyone with a keyboard, allowing individuals with absolutely no understanding of MQA’s technology, and no firsthand listening experience, to weigh in, often with vitriolic invective. There are even some respected experts in digital-audio technology and engineering who are skeptical of MQA.

These classic symptoms of Kuhn’s “crisis” phase of a scientific revolution are the result of two distinctly different paradigm shifts on which MQA is based. Bob Stuart and Peter Craven (the British mathematician who co-developed MQA with Stuart) didn’t invent the two emerging paradigms that are the foundations of MQA. Rather, they researched and discovered new ideas in other disciplines (specifically digital sampling in astronomy and medical imaging, and insights into psychoacoustics from neuroscientific advances) and applied those principles to audio. Other fields have been more open to these breakthroughs, but for some reason audio seems to be populated largely by calcified fundamentalists who cling to the past.

The first revolution MQA initiated (in the audio world, at least) is the idea that one pillar of digital audio, the so-called Nyquist-Shannon sampling theorem, while being correct for arbitrary communication, can be reconsidered for humans listening to music. Specifically, for “natural” signals such as sound or visual images (which tend to have specific characteristics and statistics), limitations imposed by conventional sampling can be overcome by a more enlightened analysis and implemented with today’s powerful digital-signal-processing technology. New techniques have been developed not for audio, but in other fields such as image processing and astronomy in which the resolution of closely spaced objects or the limitations of signal power can be of paramount importance. By exploiting signal statistics, cutting-edge medical-imaging technology can resolve data beyond the “Nyquist limit,” allowing finer resolution of visual detail or flow. There’s a direct parallel between resolving time information in musical signals and distinguishing between closely spaced objects in visual images. MQA has taken into account the human listener and the statistics of the audio signal, and adapted modern sampling theory to solve fundamental problems that have plagued digital audio since its inception. (For a technical primer on this topic, see my feature article in Issue 253 or on theabsolutesound.com. For an academic-level explanation, search on Google for “Sampling—50 Years After Shannon” by Michael Unser and “Sparse Sampling: Theory and Applications” by Pier Luigi Dragotti.)

The second paradigm shift on which MQA is based comes from neuroscience, specifically what the latest research has revealed about human hearing. Classical psychoacoustics (the old paradigm, originating in the 1930s) was based on experiments using test tones and beeps, with the subjects reporting on tones they could and couldn’t hear. The researchers approached these experiments with two assumptions. The first was that the ear was a linear microphone-like device and the brain a passive receiver that analyzed the electrical impulses, creating the sensation of sound. The second assumption was that the brain was a frequency analyzer, and that our auditory system’s resolution of timing information was implicit in our upper-frequency limit (as dictated by Fourier analysis). That is, we couldn’t discriminate timing information that implied a bandwidth greater than 20kHz. Consequently, psychoacoustics until very recently was primarily focused on the frequency domain: which pitches humans could hear, at what thresholds, along with related phenomena such as masking and the concept of critical bands.

This primacy of frequency, and the belief that the ear was a passive pick-up, has informed and permeated audio engineering ever since Harvey Fletcher’s famous experiments at Bell Labs. This paradigm, while useful in many ways, unfortunately led us astray. Audio engineering has since its birth revolved around frequency-related criteria because it was simply reflecting the psychoacoustic paradigm, leading to the design principles, instrumentation, and analysis tools used to this day. The limitations of this primitive model of how humans hear reached its grotesque zenith (or nadir, if you prefer) in MP3, which is theoretically perfect according to the old paradigm of the ear as a passive linear receiver and the brain as a frequency analyzer. In the early 1990s I attended Audio Engineering Society conventions in which Karlheinz Brandenburg, the lead developer of MP3, presented papers describing his research. His hubris was on full display, as he casually used terms such as “psychoacoustic redundancy” and “informational irrelevance” to explain how throwing away 90% of the bits was a good thing. We all know how that turned out (except for the Fraunhofer Institute that supported the research, which at one time reaped $100 million per year in MP3 royalties). But in Brandenburg’s defense, he was operating within the old psychoacoustic paradigm developed fifty years earlier.