Esoteric P-03 Universal Disc Transport and D-03 Digital-to-Analog Converter

Equipment report
Digital-to-analog converters
Esoteric P-03
One of the great mysteries of digital audio (to me, at least) is how CD transport-mechanism quality affects the sound. I’m not talking about the differences between transports as a whole, but of the mechanism that spins the disc and reads the data.

The sonic differences between transports are the result of jitter, or timing variations, in their digital outputs; this is now a welldocumented phenomenon.

But here’s the conundrum: Every CD transport mechanism recovers the same ones and zeros from the disc, whether that mechanism is a flimsy plastic job found in a $39 player or the massive VRDS-Neo mechanism in the $17,200 Esoteric P-03 transport reviewed here. You can prove this by conducting a bitfor- bit comparison between the audio data recovered from either player. So what has changed?

True, those audio data are only part of a very complex datastream that comes off the disc. They undergo significant processing to extract the audio information.

Nevertheless, one would think that processing and clocking that data through integrated circuits (and sometimes buffers) would remove any timing errors (jitter). And if that were the case, then why not recover the data with a cheap mechanism and employ a high-precision clock to correct for the mechanism’s timing inaccuracy?

There are two more pieces to this mystery. Nakamichi’s 1000 CD transport, which has an acoustic seal on the door to its slot-loading mechanism, sounds better with the door closed. Apparently, acoustic energy from the loudspeakers impinging on the disc and transport results in a slight, but audible, degradation of the sound. Once again, the bits are the same, door open or closed.

The third piece of this mystery has baffled me for nearly 20 years. I was working in a CD mastering lab (where we transferred CD mastertapes to disc on a million-dollar laser mastering machine in a clean room), and part of my job involved trouble-shooting odd technical problems relating to mastertapes and replicated discs. A client for whom we had made discs was unhappy with the results, reporting that the discs didn’t sound as good as the mastertape. I compared the data on the mastertape with the data recovered from the replicated disc (using a CD-ROM pre-mastering system) and found, not surprisingly, that the disc and tape were bit-for-bit identical. The extremely talented electrical and optical engineers I worked with (who had designed and built the mastering machine) dismissed the artist’s claim, saying: “Bits is bits.” Unfortunately, I was unable to compare the sound of the disc with the mastertape through the same digital-to-analog converters (this was back in the day when mastertapes were on ¾" U-Matic tape decoded by a Sony PCM 1610 or 1630).

Partly out of my own curiosity and partly out of the desire to please the client, we cut another master disc and replicated new discs—this time on a different mastering machine. The client reported that the new disc sounded as he intended, and went away happy. But I was left with the question of how two CDs, each containing identical data, could sound different. But now I was armed with two discs that could be played back on a high-resolution system, and my own listening confirmed that the second disc did sound better than the first. (A similar paradox, which arose many years later, is that a CD-R made from a CD often sounds better than the original CD.)

The next step was to look at the physical differences between the two discs—discs with identical ones and zeros but with different sound. An analysis of the pit and land lengths on the two discs showed that the inferior-sounding disc had greater variations in those pit and land lengths—in other words, jitter was encoded in the disc’s physical structures. (Specifically, a histogram of the frequency variance in the discs’ nine discrete pit and land lengths showed the inferior-sounding disc had a wider bell curve than the better-sounding disc.)

The question remains: How do timing variations in the raw bitstream recovered from a CD make their way into the analog output signal? If the sound is different, then the signals must be different. Doesn’t precise clocking eliminate transport-induced jitter? Would better clocks have removed the sonic differences I heard between the two replicated CDs? And how did sound impinging on a disc played in the Nakamichi transport affect a change in the analog output signal?

These puzzles lead to the question that opened this review: Does a massive and elaborate machined-metal transport mechanism in the Esoteric P-03 Universal Disc Transport sound better than a cheap plastic job? The P-03 is the ultimate expression of transport quality. This 71-pound, $17,200 device takes the task of recovering data from a CD quite seriously, employing what is unquestionably the best built and most elaborate mechanism yet devised for CD playback (save for the mechanism in Esoteric’s $25k P-01 transport).

I don’t have the answer to why transport-mechanism quality affects the sound, and can’t explain the reason that two CDs with identical data sound different, but I can say definitively that the P-03 transport and D-03 D/A converter are among the bestsounding digital sources I’ve ever heard.

P-03 Universal Disc Transport

The P-03 Universal ($17,200), plays any disc format, including CD, SACD, DVD-Audio, and DVD-Video. The machine’s DVD-V capabilities deserve attention. The P-03 Universal uses 14-bit video processing and the latest Anchor Bay Technologies de-interlacer and scaler to output video up to 1080p on its HDMI output. Without dwelling on the video performance, I will say that the P-03’s picture quality is easily the best I’ve seen from DVD, with nearly the dimensionality, depth, and resolution of high-definition sources. (The audio-only version of the P-03 Universal transport is known as the P-03 and carries a price of $13,300. The P-03 Universal’s video circuits can be turned off when playing music discs.)

The P-03’s rear panel reveals some unusual connection options. Digital output is via a standard coaxial RCA jack, an i.LINK (FireWire) connector, or two XLR jacks. These two XLR jacks together form Esoteric’s proprietary ES-Link in which two cables carry high-resolution stereo data, including two-channel SACD information, to the D-03 digital-to-analog converter for decoding. Note that the XLR jacks carry only stereo digital audio. The P-03 will output high-resolution multichannel digital audio from multichannel SACD discs, but only on the i.LINK (FireWire) port. With the P-03 Universal playing a DVD-V or DVD-A, encoded with Dolby Digital or DTS audio tracks, surround-sound information can be output via the RCA coaxial digital out.

Decoding the multichannel bitstream requires a multichannel digital-to-analog converter (the D-03 is a stereo-only device). By using the i.LINK output, one could daisy-chain three D-03 digitalto- analog converters to the P-03 and output 5.1 multichannel. A Word Sync input (BNC jack) accepts a clock from the D-03. Putting the master clock in the digital-to-analog converter and slaving the transport mechanism to this clock is an essential prerequisite for any digital front end that aspires to be state-of-the-art. That’s because no matter how well designed the digital-to-analog converter, jitter (timing variations) introduced by the S/PDIF interface between transport and processor will degrade the sound. Speaking of clocks, those with the budget and the passion can further improve the timing precision of the digital-to-analog conversion process by adding one of Esoteric’s outboard clock modules. The G25U clock ($2900) connects to the transport and processor, providing even greater precision (1 part per million). For the ultimate performance, Esoteric makes the G-0s, a $13,500 device that employs a Rubidium sub-atomic clock with accuracy of 0.05 parts per billion. (The D-03’s internal clock provides an accuracy of 3 parts per million, not significantly lower than the G25U’s accuracy, but far lower than the Rubidium clock’s precision.) A front-panel button allows you to select the upsampling rate (no upsampling, 88.2kHz, 176.4kHz), and even to convert PCM to Direct Stream Digital (DSD, the format used in SACD), for conversion to analog in the D-03 D/A converter.

The heart of the P-03 is surely the mighty VRDS-Neo transport mechanism. This device is to most CD transport mechanisms what a Ferrari is to a Yugo. Weighing in at a whopping 14 pounds, the VRDS-Neo is made like no other disc-reading device. For starters, the assembly is built around solid blocks of cut steel for rigidity. And rather than secure the disc at its center with a tiny plastic clamp, the VRDS mechanism employs a machined disc of Duralumin just larger than a CD to hold the entire disc and reduce vibration. This clamping mechanism is attached to a solid-steel “bridge” that traverses the assembly.

The motor is a custom three-phase brushless type, developed using parent company TEAC’s long experience in motor design and magnetic analysis. The spindle-shaft bearings—again designed from scratch—are made from stainless-steel balls encased in ceramic for low vibration and greater positional precision. Esoteric developed for the VRDS mechanism a novel laser-pickup structure that more precisely articulates the lens and optical pickup during disc playback. A conventional pickup is suspended from several wires, allowing it to move in many directions. The Esoteric pickup mechanism is mounted on a sled (with metal guide rails), allowing the pickup to move in only three directions (horizontal, vertical, and circular). This design reportedly results in lower vibration, less servo activity to keep the laser focused and on-track, and fewer errors. In addition, the entire sled assembly is isolated mechanically from the spindle motor to reduce vibration in the pickup.

I had hands-on time with all these transport sub-components during a visit to Esoteric’s California headquarters, which gave me an even greater appreciation not only of the engineering involved, but also of the remarkable level of execution. Every component is massively built, heavy, precise, with no apparent compromises to cost. When Esoteric wanted a sub-component, it ended up designing and building the component itself rather than sourcing it from outside suppliers. Each VRDS-Neo mechanism is made by hand and undergoes a two-day qualitycontrol check. Very few—if any—high-end companies have the resources to design and build from scratch a piece of mechanical engineering of this sophistication.

(Esoteric has designed a new transport platform called Vertically Aligned Optical Stability Platform [VOSP] for its lower-priced products, and will supply this mechanism on an OEM basis to other companies. Over the next year, you’ll see a wide variety of CD players from other high-end manufacturers using the Esoteric VOSP mechanism.)

The P-03’s VRDS-Neo transport mechanism is housed in one of the finest examples of chassis metalwork in high-end audio. The chassis construction, drawer operation, metal finish, and precision with which the chassis is assembled are beyond reproach. For example, the chrome-plated Allen bolts holding the top panel are tightened to a precise specification with a torque wrench. Here’s another example of the level of thought and detail in the P-03: When you open the drawer to insert or remove a disc, a small door glides out of the way and a blue LED gently illuminates the tray. Even the chassis feet are custom, patented, elaborate multi-part devices designed to reduce vibration. The entire product exudes elegance, luxury, precision, and serious engineering.

D-03 D/A Converter

The D-03 digital processor is housed in a chassis that is nearly identical externally to that of the P-03, making for a handsome pair when installed in an equipment rack. The unit can decode a wide range of input signals, including high-resolution PCM and DSD. As mentioned earlier, the D-03 outputs a clock to the P-03 transport, allowing the critical clock that controls the DACs (the place where jitter matters) to be generated by a precision device rather than by the jittered clock recovered from the S/PDIF digital interface. The D-03 employs other jitter-reduction techniques, including a buffer that temporarily stores the data to remove timing variations (active only when the D-03 acts as the master clock for the P-03).

The D-03 is essentially a dual-mono DAC, with completely separate power supplies (including power transformers specifically designed and built for the D-03 by Esoteric), and separate compartments within the chassis for each channel of DAC and analog output stage. A third power transformer supplies the digital circuits. Digital-to-analog conversion is handled by Analog Devices AD1955 chips in dual-differential configuration (two DACs per audio channel) for lower noise and greater conversion accuracy. The DACs can decode PCM or DSD, which means that DSD input signals are decoded in their native format rather than being converted to PCM. The analog output stage is alldiscrete, with no integrated circuits in the signal path. Output is via unbalanced RCA jacks or balanced XLRs. As noted above, two DACs per channel are employed, meaning that the balanced outputs are not compromised by the presence of a phase splitter to convert a single DAC’s unbalanced output into a balanced signal.


After living with a new component for a few months (particularly a digital source), I generally develop in my mind a shorthand synopsis of its overall sonic character. It might go something like this: “Somewhat forward perspective; sacrifices smoothness for detail resolution; deep bass extension but a little plumy in the midbass; and a touch of grain through the mids.” The reality is that all audio components impose a sonic signature on the music, some more than others. The ability to identify a component by its sound is not a good thing; it means that the product has enough of a sonic personality that its colorations overlay the music.

A school of thought in high-end audio suggests that a component can be judged purely by how different a variety of recordings sound through that component. The reasoning is that the component that resolves the biggest differences between recordings must have the least coloration and, ergo, is the superior product. Another way of evaluating a component is to see how it stacks up on a sonic checklist—tonal balance, freedom from grain, tone color, soundstaging, and the like. Finally, one can just listen to music and see how emotionally involving the experience is compared to listening through other products.

Rarely do these disparate evaluation methods converge; there are some products I admire for their audiophile attributes more than I enjoy for their musicality (they satisfy all the specific sonic checkmarks, but lack a certain indefinable magic), and others that are obviously colored but somehow manage to pull me into the music every time I sit down.

The Esoteric D-03/P-03 pair lives in the rarified company of digital sources that are outstanding by any evaluation criterion. For starters, the Esoteric is chameleon-like in its portrayal of different recordings—from tonal balance, to liquidity, to space, to overall perspective, to dynamics, the Esoteric’s sound is as variable as the disc you place in its drawer. The P-03/D-03 also excels in all the audiophile values; the pair hits all the right audiophile buttons. But most importantly—by a long shot—the Esoteric combination is immensely engaging musically. Put all this together and you’ve got a world-class digital source that’s among the few best I’ve heard.

I was taken aback by the Esoteric’s lack of a “sound.” I was unable to pin down any specific character I could attribute to the player. Just when I thought I’d identified a coloration, changing discs or switching to a different kind of music would prove me wrong. Listening through the P-03/D-03 was like looking back into the recording through a transparent window.

This lack of coloration conferred many wonderful attributes. The sense of hearing back through the playback chain to the original acoustic event gave the music a life and vividness that was startling. Tiny nuances in expression came to the forefront, which fostered the impression of hearing live music-making as it was happening rather than of listening to a canned reproduction. The Esoteric pair is hyper-detailed and vivid musically without a trace of sonic vividness. I heard a richly woven musical tapestry that encouraged, particularly in jazz, a constant changing of focus from one musician to another, of discovering a drum lick that perfectly complemented the soloist’s melodic line.

The Esoteric’s transparency to the source also paid dividends in reproduction of tone color. Instrumental timbres, rather than sounding overlaid by grain, hardness, or a common character, were instead natural and realistic. I was particularly taken by the Esoteric’s reproduction of oboe, bassoon, and bass clarinet— instruments that seem to convey feeling purely through their tone colors. A good example is Zappa’s The Yellow Shark, an orchestral album performed by Germany’s new-music group Ensemble Modern. The Esoteric’s purity of tone color brought out more expression in the compositions and their performances.

Recordings that combined woodwind and brass instruments in complex arrangements highlighted the Esoteric’s beautiful portrayal of timbre. Listening, for examples, to the rich interplay of tone colors in trumpeter Jon Faddis’ beautiful DVD-A Remembrances [Chesky] or what is perhaps the bestsounding big-band recording ever made, Dick Hyman’s From the Age of Swing [Reference Recordings], I could clearly hear the timbre of each instrument within the overall sound, rather than hearing separate instruments congeal into a synthetic whole. I also noticed this quality on unison phrases between instruments, such as sax and trumpet. This ability to hear quiet instruments with their timbres preserved in the presence of louder instruments contributed to my ability to hear more deeply into the music. The Zappa piece “The Black Page” (the live version from Make a Jazz Noise Here), which Zappa describes as having “statistical density,” was a good example; the Esoteric unraveled the many layers of rhythmic and melodic innovation that make this composition a masterpiece. (Incidentally, there’s a very interesting entry on Wikipedia on “The Black Page.”)

The Esoteric’s bottom-end was extraordinarily weighty, full, and dynamic. If the Esoteric had any identifiable sonic signature, it was a slight fullness in the midbass that added a measure of warmth to bass guitar and acoustic bass, as well as a heightened sense of power on lower-tuned toms. The thunderous tomtom fill midway through the track “Gaia” from James Taylor’s Hourglass SACD had greater weight and heft through the Esoteric. But the added touch of bass weight didn’t detract from the sense of pitch or dynamic agility.

In the portrayal of space, and of individual images within that space, the Esoteric was world-class. The soundstage was stunningly wide, throwing images beyond the confines of the loudspeakers in an almost wrap-around effect. Image focus was tight, accompanied by a sense of air and bloom around instrumental outlines. The spatial perspective tended to be vivid and sharply defined, but was never forward, aggressive, or artificially sculpted. As mentioned earlier, the sense of space changed dramatically with the recording, from the intimacy of a solo acoustic guitar to the huge and gorgeous acoustic of Myerson Symphony Center captured in Keith Johnson’s spectacular recordings on the Reference Recordings label.

This description applies primarily to the Esoteric’s reproduction of CD with the upsampling set to 176.4kHz, the best-sounding setting in my system. The top end was the most transparent and open at this frequency. I don’t know if it was my system or sonic taste, but the PCM-to-DSD conversion option was never preferable. It thickened the sound, made the bass woolly, and added a bit of glare. The sound was still good, but the magic was gone.

The ability to turn off the digital filter provided some fascinating listening sessions. Removing the digital filter was like opening a car’s convertible top on a crisp but sunny spring morning; the sound became more open, transparent, illuminated from within, and possessed a purity of tonal color that was breathtaking. Without the digital filter, the midrange had a stunning immediacy that reminded me of the sound of a 300B single-ended triode amplifier. The downside of no digital filter was a bit of brightness and slight etch to the leading edge of transients. On already bright recordings, the result was too analytical a presentation and a lack of ease. On most recordings, however, the slight top-end emphasis, coupled with the advantages described, made no digital filter the setting of choice for pure musical involvement.

Finally, if you enjoy musical performances on DVD, the Esoteric’s combination of world-class sound and state-of-the-art DVD reproduction is compelling. On DVDs with a stereo PCM track, run the D-03’s analog outputs into a controller with Pro Logic IIx or DTS Neo:6 Music decoding and connect the P-03’s HDMI output at 1080p to a 1920x1080 video display and you’re in for an aural and visual treat.


If I had to name on one hand the best-sounding digital source products I’ve heard, the Esoteric P-03/D-03 combination would certainly be included (along with the Spectral SDR-2000/SDR- 3000, Linn CD12, and the Mark Levinson No.30.6/No.31.5). That the Esoteric is also the most luxurious looking, feeling, and operating of the group, and also plays SACD, DVD-Audio, and DVD-Video, are bonuses. Moreover, the Esoteric’s build-quality is as good as it gets, and the VRDS-Neo transport mechanism simply has no peer.

When I discover something new in familiar music—the beauty of a melodic phrase that had not struck me before, a subtle layer in a rhythmic pattern, or an extra measure of expression from the musicians—I know I’m in the presence of a special component. I had many of these musical epiphanies during my time with the Esoteric P-03 and D-03, which is, ultimately, the raison d être of high-end audio. TAS