I’ve known for some time that clock-timing accuracy is essential to good-sounding digital-audio reproduction. But I didn’t realize just how important it was until I heard the Esoteric G-0Rb Rubidium Master Clock Generator. That’s right—the G-0Rb is an atomic clock in your equipment rack that provides an ultra-precise timing reference for the digital-to-analog conversion process.
Frankly, I thought an atomic clock with timing precision of ±0.05 parts-per-billion would be overkill. Esoteric already has reduced timing inaccuracy (clock jitter) in its products through the use of a separate clock line between transport and processor, as well as a system in which the processor is the master and the transport a slave. But apparently this wasn’t good enough, leading Esoteric’s Motoaki Ohmachi to create a timing device based on the element rubidium.
The G-0Rb looks at first glance like it could be a digital-to-analog converter. The unit is sheathed in Esoteric’s gorgeous metalwork, matching the P-03 universal transport and D-03 digital-to-analog converter in my equipment rack. The rear panel, however, reveals its uniqueness in the audio world—it holds eight BNC clock-output jacks, a BNC clock-input jack, and an AC power socket. A narrow front-panel display indicates the G-0Rb’s output frequency, which is adjustable via a row of buttons (see sidebar for specifics).
The G-0Rb can be used with Esoteric transports and processors that accept an external clock signal, as well as with products from other manufacturers that have a word-clock input. The G-0Rb’s outputs feed the clock-input jacks on a transport and digital-to-analog converter. In my case, front-panel switches on the P-03 and D-03 disengage the units’ conventional on-board clocks and accept the clock from the G-0Rb. Consequently, comparing the sound of the digital front end with and without the clock is as simple and fast as pushing two buttons.
And pushing those two buttons renders a staggering improvement in virtually every aspect of the music presentation. In fact, engaging the G-0Rb catapults the sound of the already great P-03/D-03 into another league, particularly with recordings that have a good sense of space and depth to begin with.
Features and Technology
The G-0Rb has three separate clock-output sections, each of which can be independently set to 1x, 2x, or 4x the base frequencies of 44.1kHz and 48kHz. (An additional frequency is available for synchronizing PAL-based video products.) Each section has two outputs, allowing up to six products to be locked to the G-0Rb. These clock-output frequencies correspond to the common sampling rates of 44.1kHz (CD and SACD) and 48kHz (DVD). An additional output frequency of 100kHz is also available, which Esoteric thinks might become a universal standard in the future. The 100kHz output also simplifies operation by obviating the need to change the clock-output frequency when switching between sources (CD and DVD-A, for example) or between different oversampling frequencies. The selected output frequencies are shown on a front-panel display that also has a small “Rb” indicator to show when the rubidium clock is stabilized, which takes a full ten minutes after turning on the G-0Rb. A clock input jack is provided for future applications; Esoteric mentions that the G-0Rb can be connected to an even more precise clock based on the element cesium.
The heart of the G-0Rb is a rubidium core housed in a sub-enclosure filled with gas that is tuned to a microwave frequency. Rubidium is a naturally occurring element that happens to vibrate at a super-precise frequency. Rubidium clocks have been used in commercial and aerospace applications for years. Interestingly, the great recording engineer Roger Nichols (Steely Dan’s Aja and Gaucho, to name two) once created and sold a rubidium clock to synchronize digital-audio gear in recording studios.
Most other digital-audio products employ a voltage-controlled crystal oscillator (VCXO) to generate a clock. A VCXO is a crystal that vibrates when a voltage is applied across it. Because the VCXO’s output frequency is a function of the voltage across it, any ripple or variations in the power-supply voltage will cause the frequency to change—the very definition of jitter. Moreover, the VCXO’s output frequency will vary if the crystal is subject to vibration. A rubidium-based clock is not only more precise and stable than a VCXO, it is not subject to such variability in its output frequency.
The G-0Rb’s build-quality is beyond reproach. The chassis is made from thick aluminum, reinforced internally, with a 5mm-thick steel bottom-plate. The feet are the same Esoteric-developed units employed in the D-03/P-03, which are designed to provide a stable, vibration-free platform for the chassis.
I’ll start with the wonderful recording of Mozart’s Piano Concerto No. 21 [Eugene Istomin and the Seattle Symphony under Gerald Schwarz on the Reference Recording label) to illustrate many of the G-0Rb’s qualities. Engaging the G-0Rb immediately expanded the soundstage in all directions. The hall sounded bigger, and the instruments seemed to “light up” the acoustic in a way that wasn’t clearly audible without the G-0Rb. The spatial relationships between the various sections of the orchestra, and their relationship to the piano, snapped into sharp focus. I could now “feel” the air and space between the instruments, as well as hear a delicious bloom around the instruments as they projected their sounds into the surrounding acoustic. I could also hear deeper into the reverb decay; the hall sounded smaller and drier without the G-0Rb. Engaging the G-0Rb caused the sound to become more detached from the loudspeakers, lowering the impression that I was listening to a playback system.
No less dramatic was the improvement in the piano’s timbre. It had a liquidity, ease, and naturalness that I’ve never heard before from a digital audio reproduction. The hardness in the midrange, the glassy “shattering” sound on leading-edge transients, and the dynamic constriction were all gone, replaced by a silky smooth yet powerful rendering. Similarly, the string section’s tone became more “organic”; with the G-0Rb turned off, the strings were overlaid with a “chalky” texture that was simultaneously brighter and thicker. Other instruments benefited from this increased timbral realism, sounding more like the real thing and less like synthetic recreations.
Another huge improvement rendered by the G-0Rb is in the separation of individual musical lines. The clock seemed to highlight the interplay between instruments, or between sections of instruments, rather than fuse them into a homogenized whole. This is, for me, an important aspect of a system’s sonic performance. If I can hear more clearly the contributions of each instrument, I become more involved with the music.
Engaging the G-0Rb brought low-level musical details to life—the subtle intonations of a singer, the sound of fingers moving on guitar strings, the steep attack and reverberant decay of percussion instruments, all were more vivid. Despite the more incisive sound, the G-0Rb never made the presentation analytical or sterile—quite the opposite. The overall sound was simultaneously (and paradoxically) more gentle and detailed.
Cymbals sounded noticeably different with the clock turned on, with a smoother and softer rendering that was less bright and more refined. Without the G-0Rb, cymbals took on a sound reminiscent of bursts of white noise; with the clock, that coarse character was replaced by a subtle and nuanced presentation that revealed the true texture and body of the instrument. I also heard a decrease in vocal sibilance (“s” and “ch” sounds) with the G-0Rb. Overall, the top end was harder, more “spitty,” and more aggressive with the G-0Rb turned off.
The G-0Rb noticeably tightened up the bottom end, too, seemingly adding greater weight and “substance.” There wasn’t “more” bass, just greater tangibility and solidity. This quality greatly affected my perception of rhythmic drive—the propulsive quality that involves your body viscerally in the music.
For example, Roscoe Beck’s terrific bass playing on guitarist Robben Ford’s Handful of Blues [Stretch] came to life with the G-0Rb, with greater pitch-definition and heightened rhythmic power.
All these qualities were as apparent on SACD and DVD-A as they were on CD. I should mention that adding the G-0Rb elevated the sound quality of the P-03/D-03 far above the sound quality I heard from the two music servers reviewed in Issue 177. In that issue, I found that music read from a hard-disk drive sounded better than the same music played from an optical disc in the P-03. With the addition of the G-0Rb, that is no longer the case.
The Esoteric G-0Rb Master Clock Generator rendered a much bigger gain in musical realism than I would have thought possible. Engaging it made digital sound more like great analog in every respect. With the G-0Rb, I had that deep sense of ease—of melting into the listening chair and becoming completely absorbed in the music—that comes so naturally with LP playback.
The difference in the timing accuracy between the clock built into the Esoteric D-03 D/A converter and the G-0Rb must be small in an absolute sense (tens of picoseconds I would guess), but the sonic and musical differences couldn’t be more profound.
The idea of an atomic clock sitting in your equipment rack to make digital sound more like analog appears bizarre on the surface, but one listen to the G-0Rb will convince you that such a precise timing reference is a fundamental requirement of state-of-the-art digital playback.
Sidebar: A Short History of Jitter
The advent of digital audio was heralded by proclamations that the sound-quality variability inherent in analog systems was a thing of the past. Once an audio signal had been digitized, the conventional wisdom held, it was immune to degradation. If the bits were the same, the sound was the same. Digital audio either worked perfectly, or it didn’t work at all.
This was, at first glance, a startling advancement over analog systems, which introduced slight (or not so slight) cumulative distortions at every turn. Put an analog signal down a piece of wire and it degrades, not to mention subjecting the signal to every other component in the signal path.
But beginning in the mid-1980s critical listeners reported hearing differences where none should have existed.1 Using observational listening techniques, audiophiles noticed musically significant variations between coaxial and TosLink connections, brands of digital cables, and even in the directionality of digital cables themselves. It was an easy matter to prove that the bitstreams were identical—how could the sound change? If the sound is different then the signals must also be different, but in what way were the signals different? What was this mysterious “X factor” that caused identical digital bitstreams to exhibit an analog-like variability?
This conflict between what audiophiles heard and what the audio academics told us was impossible quickly became a wedge between the high-end and the audio-engineering establishment, particularly members of the Audio Engineering Society (to which I, too, belonged at the time). The “bits is bits” view of digital’s perfection was an article of faith among the AES faithful. The idea that bitstreams with the same “ones and zeros” could sound different when converted to analog by the same digital-to-analog converter was viewed as the epitome of audiophile lunacy, and the high-end community was subject to open ridicule, scorn, and hostility. For example, at an AES conference I attended in London in September, 1991, John Watkinson, a respected engineer, author of several textbooks on digital audio and video, and a Fellow of the AES, used his time addressing the society to attack audiophiles on this point: “Somehow I can’t conceive of an audiophile ‘one’… You can only say that [if the data are identical, the sound is identical] once, which is a problem if you have to publish a hi-fi magazine every month. It leaves an intellectual vacuum…When the term ‘audiophile’ replaced ‘hi-fi freak,’ I immediately thought of necrophiles (sic) and pedophiles. Perhaps I wasn’t far off.’”
Nonetheless, critical listeners accepted the reality of what they heard, and high-end cables designers developed better-sounding digital cables. The academics continued to reject the idea of an analog-like variability in sound quality between two identical bitstreams. No researchers wanted to touch the subject for fear of ridicule from their colleagues. None of the test-equipment manufacturers thought about investigating the phenomenon—or developing instruments to measure the effect. Sonic differences were, in their view, purely the result of overactive audiophile imaginations.
I learned during a 1989 press trip to JVC’s laboratories in Japan that timing inaccuracy—called “jitter”—in the digital-to-analog conversion process introduced an analog-like variability in digital playback. JVC had developed a circuit it called the “K2 Interface” that reduced jitter. The engineers explained the fundamental principles involved, along with K2’s circuit details, and then proceeded to demonstrate the salutary effect of the K2 circuit. I had one of those “Ah, ha!” moments when I first understood how timing inaccuracy could cause two identical bitstreams to sound different. The JVC engineers had known about jitter for years and treated it as simply another engineering challenge to overcome.
The “paradigm shift”that transformed jitter from audiophile lunacy to textbook orthodoxy began with an October, 1991, AES paper titled “Is the AES/EBU/SPDIF Interface Flawed?” by Malcolm Hawksford and Chris Dunn. (The AES/EBU interface is the professional implementation of the consumer SPDIF interface.) Hawksford was a professor at Essex University, and Dunn one of his students. (Hawskford is now Director of the Centre for Audio Research and Engineering and Director of Postgraduate Studies within the Department of Electronic Systems Engineering.) Hawksford and Dunn laid out in precise and fascinating detail exactly how the digital interface can introduce an analog-like variability in sound quality while preserving bit-for-bit accuracy. Hawksford was no wild-eyed audiophile making claims about the audibility of a mysterious phenomenon; he was (and is) a highly respected researcher and university professor with a large body of original research. The paper shocked the audio community. Working audio engineers, who assumed the digital interface they used on a daily basis was sonically transparent, became alarmed and began experimenting themselves. Audio academics grudgingly acknowledged the phenomenon described by Hawksford and Dunn, but questioned (at least initially) how that phenomenon translated to sound-quality variations.
By the mid 1990s, jitter, and its audible effects, became established fact. Test equipment manufacturers offered jitter-analysis devices. Other academics published papers on jitter. Manufacturers of professional audio equipment began touting the low jitter in their products. Today, it’s as though the raging debates of the late 1980s and early 1990s never happened—jitter is now accepted as a source of degradation in digital-audio recording and reproduction.
1 The first such published report I’m aware of is Stereophile founder J. Gordon Holt’s review of the Sony CDP-605ES in 1986, in which Gordon reported—almost incidentally—that the coaxial connection sounded better.
Sidebar: What Exactly Is Jitter and Why Does I Matter?
This sidebar is rather technical, but bear with me and I’ll try my best to make clear what jitter is and how it degrades fidelity.
We first need some background. Most of you know that, in PCM-encoded audio, an analog waveform is sampled at some regular interval (44,100 times per second in the case of CD), and each sample generates a binary number that represents the signal’s amplitude at the instant the sample is taken. Sampling is like taking a snapshot of the analog waveform at precise intervals, and then later reconstructing the original waveform from the snapshots. Each snapshot is a 16-bit binary “word” that represents the analog signal’s amplitude. For example, a low-level signal might be represented by the binary word 000 000 000 000 0010, and a high-level signal by the binary word 110 110 010 110 101. (I’ve simplified this point for clarity.)
This series of binary words, 44,100 of them every second, is converted back into analog with a digital-to-analog converter chip. The DAC takes in a word and outputs an electrical current level that is commensurate with the word’s value. That is, if the digital word consists of all zeros, there is no output current from the DAC. If the digital word is all ones, the DAC outputs maximum current. In a 16-bit system, there are 65,536 possible output currents, corresponding to the 65,536 discrete steps in a 16-bit word. This output current is converted into a voltage that is, after low-pass filtering, a nearly exact replica of the original analog waveform.
But what tells the DAC when to convert each sample to an output current? This is where the rubber meets the road. A signal called the “word clock” is fed to the DAC. The word clock is simply a squarewave with a frequency of 44.1kHz. On the squarewave’s leading edge, a word is loaded into the DAC. On the squarewave’s trailing edge, that word is converted into an output current as shown in Figure 1. This process is repeated 44,100 times per second.1
This is where jitter matters. If the clock controlling when the samples are converted to analog isn’t a perfectly precise and stable frequency, the spacing of the “snapshots” of the original analog signal is wrong. Some samples will be too close together, others too far apart, as shown in Figure 2. The result of reassembling the samples with imprecise timing is a misshapen waveform. Specifically, timing error in the clock translates directly to an amplitude error in the reconstructed signal.
There’s more going on than simple amplitude errors. Jitter also introduces in the reconstructed analog signal spurious sideband frequencies that are not part of the original signal, and that are not related harmonically to the signal being reconstructed. Moreover, it turns out that the ear/brain is astonishingly sensitive to these timing errors in the reconstruction of musical waveforms. Keith Johnson, co-inventor of HDCD and designer of perhaps the best A/D and D/A converters ever built, once told me that he could hear the difference between eight and 15 picoseconds of clock jitter. To put this number in perspective, light travels at the rate of about one inch per 100 picoseconds. It’s an amazingly small timing variation, but one that our aural decoding system can easily detect.
The classic sonic signature of jitter is now well known and documented: loss of space and depth; softening of the bass; hardening of timbre; a glassy sound on initial transients (most noticeable on the leading edge of upper-register piano attacks); a metallic sheen overlaying the treble; and an overall flattening of the soundstage and homogenization of instrumental images within the stage.
Jitter’s deleterious effects aren’t confined to D/A conversion; jitter in the A/D clock is just as sonically harmful. Unlike D/A jitter, however, A/D jitter is permanently encoded in the digital bitstream and no playback clock, no matter how precise, can undo the damage. That’s one reason why CDs remastered from analog tapes using modern A/D converters sound better.
It’s also worth noting that jitter matters only when converting digital audio to analog. You can copy a digital bitstream to a recording device using the inferior TosLink connection with no degradation. But if you use TosLink between the digital source and D/A converter, you’ll hear jitter’s effect because the bits are now being converted to analog and the waveforms analyzed by your brain.
1 The word-clock frequency is 44.1kHz in a non-oversampling system. I use this example for clarity. In practice, the DAC is fed words at the rate of 352.8kHz in an 8-X oversampling system (352,800 is 44,100 multiplied by 8).
Word-clock jitter can be measured with a device similar to an FM demodulator in an FM-radio tuner; the 352.8kHz word clock is analogous to the FM carrier and the jitter component analogous to the frequency-modulated audio signal. The jitter analyzer strips out the jitter component, which can be measured as an RMS voltage for an overall indication of the amount of jitter, or subjected to spectrum analysis so that you can see the jitter’s frequency. Jitter can be “white” (the jitter energy is distributed relatively smoothly across a broad band) or “correlated” (the clock is jittered at specific frequencies, often related to the frequency of the audio signal being decoded).
SPECS & PRICING
Esoteric G-Orb Rubidium Master Clock Generator
Output frequencies: 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 100kHz (universal frequency)
Terminals: BNC coaxial
Power consumption: 81W
Dimensions: 17.4" x 6" x 13.8"
Weight: 40 lbs.
7733 Telegraph Road
Montebello, CA 90640