Critical listening—the practice of evaluating the quality of audio equipment by careful analytical listening—is very different from listening for pleasure. The goal isn’t to enjoy the musical experience, but to determine if a system or component sounds good or bad, and what specific characteristics of the sound make it good or bad. You want to critically examine what you’re hearing so that you can form judgments about the quality of reproduction. You can then use this information to evaluate and choose components, and to fine-tune a system for greater musical enjoyment.
Adapted and excerpted from Introductory Guide to High-Performance Audio Systems ©2007-2010 by Robert Harley.
Evaluating audio equipment by ear is essential—today’s technical measurements simply aren’t advanced enough to characterize the musical performance of audio products. The human hearing mechanism is vastly more sensitive and complex than the most sophisticated test equipment now available. Though technical performance is a valid consideration when choosing equipment, the ear should always be the final arbiter of good sound. Moreover, the musical significance of sonic differences between components can only be judged observationally.
The biggest problem in critical listening is finding the right words to express our perceptions. We hear things in reproduced music that are difficult to identify and verbalize. A listening vocabulary is essential not only to conveying to others what we hear, but also to defining and understanding our own perceptions. If you can attach a descriptive name to a perception, you can more easily recognize that perception when you experience it again.
By practicing critical listening, and adopting the language used here and in product reviews in high-end audio magazines, you will be much better able to assess the quality of audio equipment and make better purchasing decisions. By associating the descriptive terms with your own listening impressions, you can more precisely characterize how good products and systems are, and why they are good or bad.
I recommend that you write down your listening impressions during or immediately after a listening session. This will not only solidify those impressions, but also compel you to find words to describe what you heard. Often we remember more strongly the memory of an impression, rather than the impression itself. I’m often asked how I can remember how a specific component sounds when I haven’t heard that product for a year or more. Because I record my sonic experiences when writing product reviews, I have formed a capsule impression of that product’s basic character. I don’t remember every detail, of course, but the mental image may be something like “slightly etched treble, very transparent and spacious soundstage, lean bass, somewhat lacking in dynamics, analytical rather than smooth.” I also associate a general value judgment with that component: Would I want to listen to it for pleasure over a long period of time?
By writing down these impressions, I not only have a written record of my experience, but also a more specific mental impression, one that I can use when making later comparisons with other components.
The rest of this article includes descriptions of a few of the sonic criteria I listen for when evaluating a component or system. This is a partial list, excerpted from my book Introductory Guide to High-Performance Audio Systems. Other criteria include overall perspective, the midrange, soundstaging, dynamics, detail, pace, coherence, and musicality.
The first aspect of the musical presentation to listen for is the product’s overall tonal balance. How well balanced are the bass, midrange, and treble? If it sounds as though there is too much treble, we call the presentation bright. The impression of too little treble produces a dull or rolled-off sound. If the bass overwhelms the rest of the music, we say the presentation is heavy or weighty. If we hear too little bass, we call the presentation thin, lightweight, uptilted, or lean.
A product’s tonal balance is a significant—and often overwhelming—aspect of its sonic signature.
Good treble is essential to high-quality music reproduction. In fact, many otherwise excellent audio products fail to satisfy musically because of poor treble performance.
The treble characteristics we want to avoid are described by the terms bright, tizzy, forward, aggressive, hard, brittle, edgy, dry, white, bleached, wiry, metallic, sterile, analytical, screechy, and grainy. Treble problems are pervasive; look how many adjectives we use to describe them.
If a product has too much apparent treble, it overstates sounds that are already rich in high frequencies. Examples are overemphasized cymbals, excessive sibilance (“s” and “sh” sounds) in vocals, and thin-sounding or screechy violins. A product with too much apparent treble is called bright. Brightness is a prominence in the treble region, primarily between 3kHz and 6kHz. Brightness can be caused by a rising frequency response in loudspeakers, or by poor electronic design. Many digital sources and solid-state amplifiers that have a measurably flat (accurate) frequency response nevertheless add prominence to the treble.
Tizzy describes too much upper treble (6kHz–10kHz), characterized as a whitening of the treble. Tizzy cymbals have an emphasis on the upper harmonics, the sizzle and air that rides over the main cymbal sound. Tizziness gives cymbals more of an “ssssss” than a “sssshhhh” sound.
Forward, if applied to treble, is very similar to bright; both describe too much treble. A forward treble, however, also tends to be dry, lacking space and air.
Many of the terms listed above have virtually identical meanings. Hard, brittle, and metallic all describe an unpleasant treble characteristic that reminds one of metal being struck. In fact, the unique harmonic structure created from the impact of metal on metal is very similar to the distortion introduced by a power amplifier when it is asked to play louder than it is capable of playing.
I find the sound of the alto saxophone to be a good gauge of hard, brittle, and metallic treble, particularly lower treble. If reproduced incorrectly, sax can take on a thin, reedy, very unpleasant tone. The antithesis of this sound is rich, warm, and full. When the sax’s upper harmonics are reproduced with a metallic character, the whole instrument’s sound collapses. Interestingly, the sound of the saxophone has the most complex harmonic structure of any instrument. It’s no wonder that it is so revealing of treble problems.
White and bleached have meanings very similar to bright, but I associate them more with a thinness in the treble, often caused by a lack of energy (or what sounds like a lack of energy) in the upper midrange. With no supporting harmonic structure beneath it, the treble becomes threadbare and thin, much like an overexposed photograph. Cymbals should have a gong-like low-frequency component with a sheen over it. If cymbals sound like bursts of white noise (the sound you hear between radio stations), what you’re probably hearing is a white or bleached sound.
A particularly annoying treble characteristic is graininess. Treble grain is a coarseness overlying treble textures. I notice it most on solo violin, massed violins, flute, and female voice. On flute, treble grain is recognizable as a rough or fuzzy sound that seems to ride on top of the flute’s dynamic envelope. (That is, the grain follows the flute’s volume.) Grain makes violins sound as though they’re being played with hacksaw blades rather than bows—a gross exaggeration, but one that conveys the idea of the coarse texture added by grain.
Treble grain can be of any texture, from very fine to coarse and rough. (Think of the difference between 400-grit and 80-grit sandpaper.) The more coarse the grain, the more objectionable it is. The preceding discussion of grain applies in even larger measure to midrange textures, which I’ll discuss later.
Treble problems can foster the interesting perception that the treble isn’t integrated into the music’s harmonic tapestry, but is riding on top of it. The top end seems somehow separate from the music, not an integral part of the presentation. When this happens, we are aware of the treble as a distinct entity, not as just another aspect of the music. The treble should sound like an extension of the upper midrange rather than separate from it. If the treble calls attention to itself, be suspicious.
The most common sources of these problems are, in rough order of probability: tweeters in loudspeakers, overly reflective listening rooms, digital source components, preamplifiers, power amplifiers, cables, and dirty AC power sources.
So far, I’ve discussed only problems that over-emphasize treble. Some products tend to make the treble softer and less prominent than live music. This characteristic is often designed into the product, either to compensate for treble flaws in other components in the system, or to make the product sound more palatable. Deliberately softening the treble is the designer’s shortcut; if he can’t get the treble right, he just makes it less offensive by reducing it.
The following terms, listed in order of increasing magnitude, describe good treble performance: smooth, sweet, soft, silky, gentle, liquid, and lush. When the treble becomes overly smooth, we say it is romantic, rolled-off, or syrupy. A treble described as “smooth, sweet, and silky” is being complimented; “rolled-off and syrupy” suggests that the component goes too far in treble smoothness, and is therefore colored.
A rolled-off and syrupy treble may be a blessed relief after hearing bright, hard, and grainy treble, but it isn’t any more musically satisfying in the long run. Such a presentation tends to be bland, uninvolving, slow, thick, closed-in, and short on detail. All these terms describe the effects of a treble presentation that errs too far on the side of smoothness. The presentation will lack life, air, openness, extension, and a sense of space if the treble is too soft. The music sounds closed-in rather than being big and open.
Top-octave air describes a sense of almost unlimited treble extension, which fosters the impression of hearing the air in which the music exists. A slight treble rolloff (a reduction in the amount of treble) in loudspeakers, for example, can diminish top-octave air. Loss of top-octave air is also associated with an opaque or translucent soundstage.
The best treble presentation is one that sounds most like real music. It should have lots of energy—cymbals can, after all, sound quite aggressive in real life—yet not have a synthetic, grainy, or dry character. We don’t hear these characteristics in live music; we shouldn’t hear them in reproduced music. More importantly, the treble should sound like an integral part of the music, not a detached noise riding on top of it. If a component has a colored treble presentation, however, it is less musically objectionable if it errs on the side of smoothness rather than brightness.
Bass performance is the most misunderstood aspect of reproduced sound, among the general public and hi-fi buffs alike. The popular belief is that the more bass, the better. This is reflected in ads for “subwoofers” that promise “earthshaking bass” and the ability to “rattle pant legs and stun small animals.” The ultimate expression of this perversity is boom trucks that generate absurd amounts of extraordinarily bad bass.
We want to know how a product reproduces music, not earthquakes. What matters to the music lover isn’t the quantity of bass, but the quality of that bass. We don’t just want the physical feeling that bass provides; we want to hear subtlety and nuance. We want to hear precise pitch, a lack of coloration, and the sharp attack of plucked acoustic-bass strings. We want to hear every note and nuance in fast, intricate bass playing, not a muddled roar. If Ray Brown, Stanley Clarke, John Patitucci, Dave Holland, or Eddie Gomez is working out, we want to hear exactly what they’re doing. In fact, if the bass is poorly reproduced, we’d rather not hear much bass at all.
Correct bass reproduction is essential to satisfying musical reproduction. Low frequencies constitute music’s tonal foundation and rhythmic anchor. Unfortunately, bass is difficult to reproduce, whether by source components, power amplifiers, or—especially—loudspeakers and rooms.
Perhaps the most prevalent bass problem is a lack of pitch definition or articulation. These two terms describe the ability to hear bass lines as individual notes, each having an attack, a decay, and a specific pitch. You should hear the texture of the bass, whether it’s the sonorous resonance of a bowed doublebass or the unique character of a Fender Precision. Low frequencies contain a surprising amount of detail when reproduced correctly.
When the bass is reproduced without pitch definition and articulation, the low end degenerates into a dull roar underlying the music. You hear low-frequency content, but it isn’t musically related to what’s going on above it. You don’t hear precise notes, but a blur of sounds—the dynamic envelopes of individual instruments are completely lost. In music in which the bass plays an important rhythmic role—rock, electric blues, and some jazz—the bass guitar and kick drum seem to lag behind the rest of the music, putting a drag on the rhythm. Moreover, the kick drum’s dynamic envelope (which gives it the sense of sudden impact) is buried under the bass guitar’s sound, obscuring the drum’s musical contribution. These conditions are made worse by the common mid-fi affliction of too much bass.
Terms descriptive of this kind of bass include muddy, thick, boomy, bloated, tubby, soft, fat, congested, loose, and slow. Terms that describe excellent bass reproduction include taut, quick, clean, articulate, agile, tight, and precise. Good bass has been likened to a trampoline stretched taut; poor bass is a trampoline hanging slack.
The amount of bass in the musical presentation is very important; if you hear too much, the music is overwhelmed. Excessive bass is a constant reminder that you’re listening to reproduced music. This overabundance of bass is described as heavy.
If you hear too little bass, the presentation is thin, lean, threadbare, or overdamped. An overly lean presentation robs music of its rhythm and drive—the full, purring sound of bass guitar is missing, the depth and majesty of double bass or cello are gone, and the orchestra loses its sense of power. Thin bass makes a double bass sound like a cello, a cello like a viola. The rhythmically satisfying weight and impact of bass drums are reduced to shadows of their actual selves. Instruments’ harmonics are emphasized in relation to the fundamentals, giving the impression of well-worn cloth that’s lost its supporting structure. A thin or lean presentation lacks warmth and body. In any discussion of audio sins of commission and omission, an overly lean bass is preferable to a fat and boomy bass.
Two terms related to what I’ve just described about the quantity of bass are extension or depth. Extension is how deep the bass goes—not the midbass and upper bass (which are described by the words lean or weighty), but the very bottom end of the audible spectrum. This is the realm of kick drum and pipe organ. All but the very best systems roll off (reduce in volume) these lowermost frequencies. Fortunately, deep extension isn’t a prerequisite to high-quality music reproduction. If the system has good bass down to about 35Hz, you don’t feel much is missing. Pipe-organ enthusiasts, however, will want deeper extension and are willing to pay for it. Reproducing the bottom octave correctly can be very expensive.
Just as colorations caused by peaks and dips in frequency response can make the midrange unnatural, so also can bass be colored by peaks and dips. Bass colorations create a monotonous, droning characteristic that quickly becomes tiring. In the most extreme example, so-called “one-note bass,” bass instruments seem to have only one pitch. This impression is created by a large peak in the system’s frequency response at a specific frequency. This pitch is then reproduced more loudly than other pitches. One-note bass is also described as being thumpy. Ironically, this undesirable condition is maximized in boom trucks. The playback system is tuned to put out all its energy at one frequency for maximum physical impact. The drivers of those vehicles don’t seem to care that they’re losing the wealth of musical information conveyed by the bass.
Many of the terms used to describe midrange colorations actually apply to the upper bass. Chesty, thick, and congested are all useful in describing colored bass reproduction. Bass lacking these colorations is called smooth or clean.
Much of music’s dynamic power—the ability to convey wide differences between loud and soft—is contained in the bass, and bass dynamics are very important to the satisfying reproduction of music.
A system or component that has excellent bass dynamics will provide a sense of sudden impact and explosive power. Bass drum will jump out of the presentation with startling energy. The dynamic envelope of acoustic or electric bass will be accurately conveyed, allowing the music full rhythmic expression. We call these components punchy, and use the terms impact and slam to describe good bass dynamics.
A related aspect is speed, though, as applied to bass, “speed” is somewhat of a misnomer. Low frequencies have inherently slower attacks than higher frequencies, which makes calling bass response “slow” technically incorrect. Nonetheless, the musical difference between “slow” and “fast” bass is profound. A product with fast, tight, punchy bass produces much greater rhythmic involvement with the music.
Although reproducing the sudden attack of a bass drum is vital, equally important is a system’s ability to reproduce a fast decay (i.e., how a note ends). The bass shouldn’t continue reverberating after a drum whack has stopped. Many loudspeakers store energy in their mechanical structures and radiate that energy slightly after the note itself. When this happens, the bass has overhang, a condition that makes kick drum, for example, sound bloated and slow. Music in which the drummer uses double bass drums is particularly revealing of bass overhang. If the two drums merge into a single sound, overhang is probably to blame. You should hear the attack and decay of each drum as distinct entities. Components that don’t adequately convey the sudden dynamic impact of low-frequency instruments rob music of its power and rhythmic drive.
The term perspective describes the apparent distance between the listener and the music. Perspective is largely a function of the recording (particularly the distance between the performers and the microphones), but is also affected by components in the playback system. Some products push the presentation forward, toward the listener; others sound more distant, or laid-back. The forward product presents the music in front of the loudspeakers; the laid-back product makes the music appear slightly behind the loudspeakers. Put another way, the forward product sounds as though the musicians have taken a few steps toward you; the laid-back product gives the impression that the musicians have taken a few steps back.
Another way of describing perspective is by row number in a concert hall. Some products seem to “seat” the listener at the front of the hall—in Row D, say. Others give you the impression that you’re sitting farther back; say, in Row S.
Several other terms describe perspective. Dry generally means lacking reverberation and space, but can also apply to a forward perspective. Other watchwords for a forward presentation are immediate, incisive, vivid, palpable, and present. Terms associated with laid-back include relaxed, easygoing, and gentle.
Products with a forward presentation produce a greater sense of an instrument’s presence before you, but can quickly become fatiguing. Conversely, if the presentation is too laid-back, the music is uninvolving and lacking in immediacy. If the product under evaluation distorts the perspective captured during the recording, I prefer that it err in the direction of being laid-back. When a product is overly immediate, I feel as though the music is coming at me, assaulting my ears. The reaction is to close up, to try to keep the music at arm’s length. The ideal midrange presentation has a sense of palpable presence, that feeling of the musicians existing right before you, but without sounding aggressive.
Conversely, a laid-back presentation invites the listener in, pulling her gently forward into the music, allowing her the space to explore its subtleties. It’s like the difference between having a conversation with someone who is aggressive, gets in your face, and talks too loudly, compared with someone who stands back, speaking quietly and calmly.
In loudspeakers, perspective is often the result of a peak or dip in the midrange (a peak is too much energy, a dip is too little). In fact, the midrange between 1kHz and 3kHz is called the presence region because it provides a sense of presence and immediacy. The harmonics of the human voice span the presence region; thus, the voice is greatly affected by a product’s perspective.
Untrained listeners who can’t specifically identify whether a reproduced musical presentation is forward or laid-back will feel a tension that usually translates into a desire to turn down the music if the presentation is too forward. Conversely, a laid-back presentation can make the music uninvolving rather than riveting.
Note that the terms associated with overall perspective can be used to describe specific aspects of the presentation (such as the treble) in addition to describing the overall perspective. If we say the treble is forward, we mean that it is overly prominent, sounding as if it is closer to the listener than the rest of the music.
Soundstaging is the apparent physical size of the musical presentation. When you close your eyes in front of a good playback system, you can “see” the instrumentalists and singers before you, often existing within an acoustic space such as a concert hall. The soundstage has the physical properties of width and depth, producing a sense of great size and space in the listening room. Soundstaging overlaps with imaging, or the way instruments appear as objects hanging in three-dimensional space within the recorded acoustic. As mentioned previously in this chapter, a large and well-defined soundstage is most often heard when playing audiophile-grade recordings made in a real acoustic space such as a concert hall or church.
The most obvious descriptions of the soundstage are its physical dimensions—width and depth. You hear the musical presentation as existing beyond the left and right loudspeaker boundaries, and extending farther away from you than the wall behind the loudspeakers.
Of all the ways music reproduction is astounding, soundstaging is without question the most miraculous. Think about it: The two loudspeakers are driven by two-dimensional electrical signals that are nothing more than voltages that vary over time. From those two voltages, a huge, three-dimensional panorama unfolds before you. You don’t hear the music as a flat canvas with individual instruments fused together; you hear the first violinist to the left front of the presentation, the oboe farther back and toward the center, the brass behind the basses on the right, and the tambourine behind all the other instruments at the very rear. The sound is made up of individual objects existing within a space, just as you would hear at a live performance. Moreover, you hear the oboe’s timbre coming from the oboe’s position, the violin’s timbre coming from the violin’s position, and the hall reverberation surrounding the instruments. The listening room vanishes, replaced by the vast space of the concert hall—all from two voltages.
A soundstage is created in the brain by the time and amplitude differences encoded in the two audio channels. When you hear instrumental images toward the rear right of the soundstage, the ear/brain is synthesizing those aural images by processing the slightly different information in the two signals arriving at your ears. Visual perception works the same way: there is no depth information present on your retinas; your brain extrapolates the appearance of depth from the differences between the two flat images.
Audio components vary greatly in their abilities to present these spatial aspects of music. Some products shrink soundstage width and shorten the impression of depth. Others reveal the glory of a fully developed soundstage. I find good soundstage performance crucial to satisfying musical reproduction. Unfortunately, many products destroy or degrade the subtle cues that provide soundstaging.
Terms descriptive of poor soundstage width are narrow and constricted—the music, squeezed together between the loudspeakers, does not envelop the listener. A soundstage lacking depth is called flat, shallow, or foreshortened. Ideally, the soundstage should maintain its width over its entire depth. A soundstage that narrows toward the presentation’s rear robs the music of its size and space.
The illusion of soundstage depth is aided by resolution of low-level spatial cues such as hall reflections and reverberation. In particular, the reverberation decay after a loud climax followed by a rest helps define the acoustic space. The loud signal is like a flash of light in a dark room; the space is momentarily illuminated, allowing you to see its dimensions and characteristics.
To produce a realistic impression of a real instrument in a real acoustic space, the reverberation and hall sound must be distinct from the image itself. Better audio components place the image within, rather than superimposed on, the recorded acoustic. Poor-quality components don’t resolve these spatial cues; they shorten soundstage depth, truncate reverberation decay, and fuse the reverberation into the instrumental images. When this happens, an audio system’s ability to transport you to the original acoustic space is diminished.
Now that we’ve covered space and depth, let’s discuss how the instrumental images appear within this space. Images should occupy a specific spatial position in the soundstage. The sound of the bassoon, for example, should appear to emanate from a specific point in space, not as a diffuse and borderless image. The same could be said for guitar, piano, sax, or any other instrument in any kind of music. The lead vocal should appear as a tight, compact, definable point in space exactly between the loudspeakers. Some products, particularly large loudspeakers, distort image size by making every instrument seem larger than life—a classical guitar suddenly sounds ten feet wide. A playback system should reveal somewhat correct image size, from a 60'-wide symphony orchestra to a solo violin. I say “somewhat” because it is impossible to re-create the correct spatial perspectives of such widely divergent sound sources through two loudspeakers spaced about 8' apart. Although image size and placement are characteristics inherent in the recording, they are dramatically affected by components in the playback system.
Terms that describe a clearly defined soundstage are focused, tight, delineated, and sharp. Image specificity also describes tight image focus and pinpoint spatial accuracy. A poorly defined soundstage is described as homogenized, blurred, confused, congested, thick, and lacking focus.
A related issue is soundstage layering. This is the ability of a sound system to resolve front-to-back cues that present some images toward the soundstage front and at varying distances toward the soundstage rear. The greater the number of layers of depth gradation, the better. Poor components produce the impression of just a few layers of depth: you hear perhaps three or four discrete levels of distance within the soundstage. The best components produce a sense of distance along a continuum: very fine gradations of depth are clearly resolved. Again, the absence of these qualities constitutes subconscious cues that what you’re hearing is artificial. When this important spatial information is revealed, however, you can more easily forget that you’re not hearing the “real thing.”
Bloom—the impression that individual instrumental images are surrounded by a halo of air—is often associated with soundstaging. Although image outlines may be clearly delineated, a soundstage with bloom has an additional sense of diffused air around the image. It is as though the instrument has a little space around it in which it can “breathe.” Bloom gives the soundstage a more natural, open, and relaxed feeling.
The term action, coined by my colleague at The Absolute Sound, Jonathan Valin, describes the sense of bloom expanding outward into space from the instrument, and the way in which the instrument’s dynamic envelope grows. Action is bloom with a dynamic component. Think of how a trumpet sounds in real life, and the way the sound seems to expand in space as the instrument gets louder. Some audio components express this characteristic better than others; those with a good sense of action sound more realistic, vibrant, and alive than those that lack action.
A product’s soundstaging performance should be evaluated with a wide range of musical signals. Some products may throw a superb soundstage at low levels, only to have that collapse when the volume increases during musical climaxes. Listen for changes in the spatial perspective with signal level.
Some products produce a crystal-clear, see-through soundstage that allows the listener to hear all the way to the back of the hall. Such a transparent soundstage has a lifelike immediacy that makes every detail clearly audible. Conversely, an opaque soundstage is thick or murky, with less of an illusion of “seeing” into space. Veiling is often used to describe a lack of transparency.
Soundstaging is toward the top of my list of sonic priorities (after tonal balance and lack of grain). The ability to present the music as a collection of individual images surrounded by space, rather than as one big image, is very important to the creation of musical realism. A large contributing factor to musical involvement is the impression that the music exists independently of the loudspeakers—that is, when we hear music as existing in space rather than attached to the loudspeakers. A soundstage with space, depth, focus, layering, bloom, and transparency is nothing short of spectacular.
To evaluate soundstaging and hear the characteristics I’ve described, you must use recordings that contain these spatial cues. Studio recordings made with multiple microphones and overdubs rarely reveal the soundstaging characteristics described. Recordings made in real acoustic spaces with stereo microphone techniques and a pure signal path are essential to hearing all aspects of soundstaging. In short, soundstaging is provided by a combination of the recording and the playback system. If soundstaging cues aren’t present in the recording, you’ll never know how well or how poorly the component or system under evaluation reveals them. Most audiophile recordings are made with purist techniques (usually two microphones) that naturally capture the spatial information present during the original musical event..
Finally, superb soundstaging is relatively fragile. You need to sit directly between the loudspeakers, and every component in the playback chain must be of high quality. Soundstaging is easily destroyed by low-quality components, a bad listening room, or poor loudspeaker placement. This isn’t to say you have to spend a fortune to get good soundstaging; many very-low-cost products do it well, but it is more of a challenge to find those bargains.