New Methods for Quantifying Sonic Performance

Part One: Improved Methods for Estimating Sonic Differences in Audio Systems

Equipment report
Solid-state power amplifiers,
Tubed power amplifiers,
Solid-state preamplifiers,
Tubed preamplifiers,
Digital-to-analog converters,
New Methods for Quantifying Sonic Performance

We created a series of up-sampled files from the ripped CD which provided intermediate sonic markers of SQ. The scale was then assembled by subjectively judging the incremental improvement in SQ from one sonic marker to the next. Initially a “small” SQ difference was defined as a 10 point improvement, a “medium” SQ difference was defined as 20 points, and a “large” SQ difference was defined as 30 to 50 points. With the recording we used to conduct these tests, our scale spanned from an arbitrary 100 points for the CD up to 180 points for the high-resolution download. However, we felt there was a degree of uncertainty in our judgments when the difference between sonic standards was larger than 30 points. Therefore, for this article we created a series of closely-spaced SQ standards by either up-sampling the CD rip to all resolutions possible up to 192/24, or down-sampling the original native 192/24 high resolution recording to all possible lower resolutions. The choice of the recording we used was also quite important (as will be made clear in Part 2 of this article). We made extensive use of a modern performance of Chabrier’s España released in 2010 by Acousence in Germany (ACO-DF-41610) and an older, standard-resolution, but excellent remastered recording of Ramirez’ Misa (LIM K2HD 040, UD). For various reasons, these were much better than the Reference Recordings Rachmaninoff Symphonic Dances we used in our earlier articles (see below).

As illustrated in Figure 1, by combining all possible up-converted files made from the ripped CD with all down-converted files made from a high-resolution 192/24 download, we created a scale extending from 100 to 210 points, with the largest increment in SQ no greater than 20 points. This 20 point interval between sonic standards was well within the accurate range of our subjective sonic judgment. Readers can construct their own sonic standards and measurement scales with the equipment they own regardless of whether their system is better or worse than ours. They can then know subjectively the significance of our results within the context of their own equipment, environment and listening acuity.

The improved scale shown in Figure 1 was created shortly after publication of the fourth installment of our previous article on computer audio (Issue 221). Without any expectation bias, we used this scale to evaluate the effect of sample-rate conversion on SQ. Sometime later, we tried plotting these subjective numerical scores as a function of either up-sampling from a ripped CD or down-sampling from a high-resolution file copied from a commercial DVD-R. When plotted on a semi-logarithmic scale, although there was some data scatter as would be expected, we were quite surprised to find a fairly obvious linear relationship between the degree of up-or down-sampling and our subjective SQ judgment. These results are illustrated in Figure 2A.

As we pondered the significance of these observations, it became obvious that a subtler pattern was buried in this data. No matter how many times we repeated these experiments, the semi-log plots consistently displayed a pattern in which every SQ score that was a multiple of a 48 kHz sampling frequency fell above our original best fit line; conversely, every SQ score derived from multiples of 44.1kHz sampling frequency fell below the line. We then re-plotted the data treating each sampling series independently. Much to our astonishment, the data points now showed virtually no scatter, or at least, very, very minimal scatter and a surprisingly precise linear relationship between our subjective scoring method and sampling frequency. These re-plots are shown in Figure 2B.

Several important points may be drawn from these data.

  1. These results show the consistency and reproducibility of our subjective method for reliably detecting sonic differences.
  2. The results shown in Figure 2B illustrate the very significant and useful sonic improvement that can be achieved with up-sampling. Ripping CDs followed by up-sampling improves the original SQ and provides enhanced value to an existing CD collection. Up-conversion of sampling frequency per se can add no new information and cannot modify the bandwidth limits of the original recording. Improvements in SQ clarity under these circumstances are therefore most likely due to a reduction of various forms of filter-induced phase distortion. Up-conversion of bit depth does add more information in the form of more accurate mathematical estimates of volume relationships between musical fundamentals and overtones and would be expected to produce a more authentic sense of “realism” to the brain. Other mechanisms for why up-conversion improves SQ can be hypothesized that may involve reduction of systemic digital artifacts in the processing or transmission of information, ultimately explicable by conventional or non-conventional forms of jitter or S/N interference. Time and further research will be required to explain these results.
  3. On the other hand, if one were to burn a CD from a legitimate high-resolution file, the down-sampling results illustrated in Figure 2B show that one can create a CD or a 44.1kHz/16-bit file that is significantly better (a good 35 points on our scale) than a modern commercial CD. This might be of value for use in cars, portable music players, or secondary systems.
  4. The existence of a linear relationship between down-or up-sampling procedures is revealing and provides some justification for commercial claims that up-sampling produces a better SQ akin but not equal to true high-resolution recordings. The industry practice of charging the same price as authentic high-resolution recordings for such artificially-produced better-sounding products is not justified since it possible to achieve the same benefit at home by up-sampling from CDs (at least in our experience using iZotope Adv.). Neither is it justified to charge the same price for transcriptions of analog mastertape digital transfers regardless of degree of up-sampling.
  5. After hundreds of listening tests using familiar source material, it is apparent that we have achieved a very high level of consistency when using the criteria we deem most significant in making overall subjective judgments. We estimate our subjective accuracy to be on the order of +/-5 points on our scale in Figure 1. We would characterize a 5-point difference as “small”, which would require 4 to 6 A/B comparisons before reaching a consistent judgment. With the number of repetitious listening tests we have run, we can now identify with assurance 10-point differences with a single A/B comparison, although others with less experience with our methods might require 3 to 4 A/B comparisons, a difference we used to call “medium.” “Large” differences of 30 to 40 points might only require 1 or 2 A/B comparisons for most listeners. In our experience, a 20-point difference is now quite easy for us to categorize and can be reliably detected with only one A/B comparison. What we estimate to be a 10-point significant difference under our conditions may well be inaudible on poorer systems or with less experienced listeners. It is also possible the opposite could be true. In either case, readers can compare their results with ours using the same music file standards as we have chosen.
  6. We considered the possibility that the 44.1 versus 48kHz difference pattern shown in Figure 2B might be related to the oft-assumed superiority of even versus odd-order multiples of the sampling frequency of the original master recording. However, we have rejected this explanation and have concluded that these current results must be an artifact of the clock frequencies of the DAC (recent experiments indicate the effect seems to be independent of the DAC used), an artifact of the internal clocking of the Windows-based PC server we used, or an artifact of the sample rate conversion software. Assuming the recording information published by Reference Recordings is correct, this persistent pattern seems unrelated to the original master recording frequency since the same pattern was observed in both master recordings and CDs derived from multiples of 44.1kHz (Reference Recordings) and 48kHz (Acousence). We are continuing to investigate the source of this effect.
  7. The significantly large difference in the down-sampling compared to the up-sampling slopes derived from the Acousence recordings can be explained as follows. When reducing the sampling frequency of a high-resolution file, one is actually losing information. Thus, the slope of the down-sampled line is steeper than the slope of the up-sampled line. When up-sampling from a lower to higher bandwidth, no new information can be added or created in the process. That the up-sampled line has a positive slope at all (the sound of ripped CDs improves with degree of up-sampling), we suggest is likely due either to a reduction in the influence of anti-aliasing filters, a reduction in some form of jitter, or possibly an increased immunity of the enlarged file size to the degrading effect of ambient jitter by a mechanism yet to be adequately explained.
  8. Finally, the shallowness and unusual similarity in the linear slopes of the SQ of either the up-sampled or down-sampled Reference Recording lines plotted in Figure 2B when compared with the Acousence results were unexpected and are suggestive of some problem with this recording.

All of these individual characteristics might well be causally interrelated and represent in each case the degree of low-level information retrieval in a given situation. The idea that a single sonic characteristic may encapsulate all or most of the typical audiophile criteria of SQ is actually not a new concept. Harry Pearson speculated that front-to-back depth alone might represent just such a single figure of merit for a given piece of equipment or system (see Issue 215, page 78). Unfortunately, there is no very good way of accurately measuring depth.

But we believe we have found such a single figure of merit by using height measurements of specific instruments or voices in carefully selected high-quality recordings. After listening to our sonic quality standards again and measuring the height, this method appears to correlate precisely with overall perceived SQ. Fortuitously, the height of an instrument or voice above a baseline height in any given listening environment can easily be measured in inches with a high degree of accuracy. This method has revealed thought-provoking insights into the quality of various recordings. Additionally, it can be applied to gauge the SQ of a variety of system variables. In the following section we present data to support these findings.

Featured Articles