Psychoacoustics sets out to establish the relations between our auditory sensations and the physical characteristics of the acoustic stimuli. These relations are obtained using techniques of experimental psychology. The statistical response of a great number of listeners to clearly defined stimuli is established, without necessarily seeking to explain the intimate working mechanism of the ear.
The stimuli are pure tones, random tones, speech, etc. The listeners' responses are expected to reveal whether a sound is audible or not, to show whether there is a modification or not of the auditory sensations, to classify sounds perceived according to a scale, etc.
The manner of listening is very important and must be specified. A distinction is made between listening with headphones, listening in free field conditions, and listening in a diffuse or semi-diffuse sound field. A rich set of sound samples, covering most of the following psychoacoustic effects, may be found here.
Auditory sensation area
The frequency and intensity region of audible sounds is known as auditory sensation area. It is delineated at its bottom by the threshold of hearing and at its top by the threshold of pain. For a given listener, the threshold of hearing is the minimum sound pressure level of a specified sound that is capable evoking an auditory sensation. The sound reaching the ears from other sources, as the bone structure for instance, is assumed to be negligible. The threshold of pain for a given listener is the minimum sound pressure level of a specified sound which will stimulate the ear to a sensation of definite pain.
In the 2 to 4kHz frequency range, where the ear is most sensitive, the dynamic range corresponding to the difference between the threshold of pain and threshold of hearing is approximately 140dB.
Our hearing normally deteriorates with age: this is know as presbycousis. This evolution is objectivized by a hearing loss, i.e. the amount expressed in decibels by which the threshold of hearing of the impaired ears exceeds the standard threshold of hearing for a specified frequency. Presbycousis is more marked for men than for women: this is explained by different exposures to noise during our lives. It is thus proven that a long exposure to intense noise, at work, for example, accelerates the aging process of our hearing and can cause higher than normal hearing losses. Losses of around 25dB can cause an appreciable degradation in speech intelligibility.
Loudness is the attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from soft to loud. It depends primarily upon the sound pressure of the stimulus, but also upon its frequency, waveform, and duration.
Loudness is characterized by equal-loudness contours (see picture below), showing the related values of sound pressure level and frequency required to cause a given loudness level. The equal-loudness contours depend on the stimulus and on the manner of listening.
More generally speaking, the loudness level of a sound in phons is equal to the sound pressure level of a free progressive wave having a frequency of 1000 Hz presented to otologically normal listeners facing the source that is judged equally as loud as the unknown sound.
The pitch is the attribute of auditory sensation in terms of which sounds may be ordered on a scale, i.e. the psychoacoustic scale, extending from low to high. Pitch is what allows us to recognize a melody. The pitch of a pure tone depends mainly on the frequency, in a nonlinear way, but also on the sound pressure, and very differently from one person to another.
For a harmonic sound, the pitch is practically that of the fundamental, even if the latter is absent: our ear is, therefore, sensitive to the frequency intervals between harmonics. It may be noted that pitch also depends on the amplitude of the harmonics: eliminating certain harmonics, with filters for example, modifies the pitch.
The pitch of inharmonic sounds depends on the amplitudes of the pure tones present and on their frequency intervals. The pitch of a random sound depends on the frequency range within which the components have relatively large amplitudes. Certain sounds have several formants, which means their pitch is ambiguous. Likewise, filtering percussive sounds (xylophone, for example) changes their pitch.
Besides the pitch defined above as the psychoacoustic scale, there is the harmonic or musical pitch, which corresponds to a logarithmic frequency scale. The harmonic pitch of Western polyphonic music and other traditional music is based on the octave interval: two sounds with an octave ratio are perceived as similar, or seem even blended together, whether they are heard simultaneously or successively. The octave unison of voices or instruments consists in making them perform the same melody, but one or several octaves higher or lower: only a single melody is heard.
If 2 sounds with an octave interval are considered equivalent, this leads to a relative musical pitch, which means the interval between these 2 sounds is referred to the same octave.
The harmonic pitch only constitutes an aural scale for listeners with a musical ear. A listener with perfect pitch is able to recognize the harmonic pitch of any single sound. This would seem to be an inborn quality. An ear can, nevertheless, be trained by associating, for a given instrument, the pitch and timbre of each note. The relative musical ear recognizes the interval between 2 sounds heard simultaneously or in succession. This is what is truly crucial for the musician. Practice gently enhances its development.
Timbre is the attribute of auditory sensation that enables a listener to judge that 2 sounds, similarly presented and having the same loudness and pitch, are dissimilar. Timbre depends primarily upon the spectrum of the sound, but also on the formation and extinction transients (very fast variations of the amplitude).
For the psychoacoustician, who studies the human subjective perception of sounds, timbre often defines a junk-box category: if 2 sounds of same loudness and pitch are perceived as different, they must have different timbre. A more or less pleasant coloration attribute is often associated with timbre. It is described by subjective terms and qualifications that are also used for painting, such as warm, cold, full, round, piercing, sharp, brilliant, dull, etc. In music, a timbre refers above all to the tones of the various instruments, allowing them to be distinguished. It is, however, more than this, because the manner of establishing the sound, or, in other words, the attack, partly determines the timbre.
The musical attribute of sound or, conversely, the noise quality of sound, is associated with timbre: the former is pleasant, the latter unpleasant. Broadly speaking, a harmonic sound is more musical than a random one. This is why noise has often been defined as a complex inharmonic or random sound.
Consonance and dissonance
When listening simultaneously to 2 pure tones, a listener perceives an unpleasant or undefined impression, known as dissonance, as soon as the frequency difference is equal to about 6 to 8Hz. The dissonance is at its strongest for an interval close to the tempered halftone.
When the interval between 2 sounds is an integer number of octaves, the dissonance disappears to be replaced by a consonance that gives out a calm impression of plenitude and completeness. The 2 sounds seem to become one, as if they were at unison, i.e. of same frequency.
When simultaneously listening to harmonic sounds or series, the same impression of consonance is perceived if the ratio of their fundamentals is a simple fraction. This property can be explained by the existence of common harmonics: the greater their number, the better the consonance. Conversely, a dissonance is all the more marked, the fewer common harmonics there are and the smaller the differences between the other harmonic in presence (beats and harshness).
The musical intervals may be categorized by decreasing consonance. We successively have: unison (1/1), octaves (n/1), perfect fifth (3/2), major third (5/4), perfect fourth (4/3), etc. The interval of a major seventh (15/8) is considered to be the most dissonant.
Finally, it should be pointed out that consonances and dissonances greatly depend on each individual's musical experience, and, more broadly speaking, musical culture.
When listening to 2 sounds of different levels, a masking is observed: the more intense masking sound causes a decrease in the loudness of the less intense masked sound, compared to its loudness before the masking sound was introduced.
Masking depends on the characteristics of the sounds in presence and the manner of listening. It is studied by determining how much the threshold of hearing of the masked sound is raised by the presence of the masking sound.
This behavior of the human hearing was taken into account when defining the compression algorithms for any lossy digital audio formats, as for example the MP3. A linear audio stream such as on an audio CD has a bitrate of 1411kbps. A typical Internet-quality MP3 file has a bitrate of 128kbps. This involves a compression ratio of approx 11:1. How is it possible to achieve it without reducing the information? Nope. So, while reducing the information, how is it possible to keep the sound quality to an acceptable level? One answer is quite simple: taking advantage of the masking effect, which in this case means throwing away some signals or parts of the signal that would be masked anyway by our brain.
Localization is the property of being able to localize a sound source in space. It is only possible with binaural, i.e. with both ears, listening. While being fairly precise in the horizontal plane, it is mediocre with respect to both the elevation angle and distance. Two phenomena explain horizontal sound source localization:
- the difference of sound pressure between the 2 ears due to diffraction of the head;
- the difference of the sound paths between the source and ears.
The head constitutes an obstacle to the propagation of sound and determines an acoustic shadow for sounds, the wavelengths of which are smaller than its dimensions. For a lateral source, there is, therefore, a difference in loudness between the 2 ears, which allows the source to be located. This phenomenon only intervenes above approximately 400Hz.
As diffraction depends on the frequency, the timbre of a complex sound is modified according to the position of the source, and thus yields a further means of locating a source. This mode requires that sufficient knowledge be acquired concerning the timbretheir modification in relation to the direction of the source and, therefore, depends on each person's individual experience.
The difference of sound paths creates a time delay between the arrival times at each ear. When in a steady state, this delay corresponds to a phase shift. The latter only allows the source to be located if it is less than one-half wavelength. This is the case for frequencies below approximately 800 Hz.
Experiments on source localization, in particular those carried out using headphones, show that human beings are capable of detecting delays of around 10 microseconds. The normal otological threshold, i.e. the normal sensibility, is roughly equal to 40 microseconds.
The phenomena described above combine with each other and allow a source to be located with an increasing precision the more complex the sound. For low-frequency pure tones, there is an ambiguity between front and rear localization. The time delay alone is not enough in this case to localize the source correctly.
Localization with respect to the elevation angle may be explained by the diffraction caused by the head, but most of all by the ear pinna. This diffraction gives rise to peaks and dips in the spectrum of the incident sound and, therefore, induces different timbres. It would, indeed, seem that people acquire sufficient listening experience to be able to associate these timbre variations with the different elevation angles.