International Year of Sound 2020+
The thin line between an analog and a digital signal:
What we often ignore about the Nyquist-Shannon theorem
The perception of sound
In audio terms, which is what we deal with at the FN, analog is everything that happens in nature, and it is also what our auditory system picks up and sends to the brain. The brain then turns that information into a sensory experience, which we recognize as sound.
The analog signal
An analog signal is defined as an electrical signal with a continuous waveform that changes over time. This signal can be further classified as simple or composite. A simple analog signal has the shape of a sine wave.
A composite analog signal, on the other hand, has an irregular waveform, which, if necessary, can be further broken down into several sine waves.
An analog signal is measured and described in amplitude, frequency, and phase. The amplitude indicates the maximum magnitude of the signal, in electrical terms the voltage, in acoustic terms the level or volume. The frequency indicates the rate at which the signal changes polarity, it is calculated by dividing the wavelength by a fixed time unit. The phase indicates the position of the wave with respect to time "zero", that is from the beginning of the cycle.
The range of values in an analog signal is not fixed. An analog signal by definition can take on all values between the two edges of this range, representing the value of a physical magnitude as it is in reality.
The digital signal
A digital signal is a timed signal defined as discrete, i.e. not continuous, which assumes only two values, a high one for state "1" and a low one for state "0". It therefore carries information or data in binary form, representing the information as bits.
In audio terms, to make sense of a digital signal, we describe the bit stream, i.e. the changes of state of the signal over time, by setting its resolution, consisting of the sampling frequency (in the following image represented by the vertical lines - time units) and the quantization (i.e. the horizontal lines - bit depth).
Sampling and reconstruction
Sampling is the process by which a continuous signal is transformed into a discrete signal: the continuous, analog signal is first translated into a voltage, then broken down and transformed by an analog-to-digital converter (ADC) into a discrete one. An ADC thus samples the voltage and converts it into a digital signal.
The opposite process to sampling is reconstruction. The fundamental difference between continuous and sampled signals is that a continuous signal is defined everywhere in time, whereas a sampled signal is defined only in the instants of sampling. Since a sampled signal is not defined between the samples, it cannot be used directly in a continuous system. To use a sampled signal in a continuous system, it must be transformed using a digital-to-analog converter (DAC). This conversion of the sampled signal to a continuous signal is called reconstruction.
The reconstruction is completed by interpolating the resulting signal. Interpolation is the process in which the value of the continuous signal between two samples is modeled by taking the previous and next contiguous values as a reference.
The processes of sampling and reconstruction (please note, the latter is before interpolation) are illustrated below:
... with some difficulty, the aliasing
By ignoring what happens between the individual samples, the sampling process throws away information about the original signal. If we know the frequency of the original sine wave, we will be able to accurately predict the sampled signal. This is an easy concept to grasp and apply. But once sampled, the signal will not necessarily appear to be at the same frequency as the original signal. This means that, in some cases, given a pair of sampled signals, one of which is derived from a lower frequency sine wave and the other from a higher frequency, we will have no way to distinguish these signals from each other. This ambiguity between two signals of different frequencies (or two components of a signal) is called aliasing, and it occurs whenever we sample a signal in the real world.
Analog vs. digital, who will prevail?
Often the information that circulates, presents digital systems as perfect and analog systems as old, outdated, inaccurate. The facts are different: the analog signal follows exactly the progress of the quantity it represents, while with digital signals everything is converted and reduced to a sequence of "0" and "1". As much as you can fractionate time and amplitude, it is still an approximation; therefore, theoretically it is the analog signal to be perfect.
But, going back a step, what do we mean in daily life by digital signal? Really the waveform (electrical, square) given by the flow of bits, or the sequence of numbers obtained from the sampling of the analog source (i.e. the interpretation of the bits), or simply what our audio playback device, no matter if a simple cell phone, tablet, PC, or a more sophisticated device processes and conveys to us as sound?
We could give an answer as simple as indisputable, saying that: digital sound does not exist, so what does it matter? A digital signal is just a means of transport for something that can only be consumed, and enjoyed, in analog mode anyway.
So why do we digitize sound? One of the reasons why digital signals are preferred to analog ones is the ease of transmission and reproduction thanks to self-healing. In fact, if the digital signal contains errors below a certain threshold, these correct themselves. In a digital signal only partially deformed, just the limitation to the values "0" and "1" makes the bits still recognizable:
Another advantage of using digital signals is the ease of storage, to store a bit all you need is some sort of switch. Obviously, we are not talking about a mechanical device, but an electronic switch, which can withstand many on/off cycles per second and can be miniaturized.
If, thanks to the arguments presented, we now accept the idea, how do we determine the optimal resolution needed to capture a sound numerically? The more discerning will say that for sampling, which, remember, is only the first step in the digitization process, there is the Nyquist-Shannon theorem.
What does the Nyquist-Shannon theorem say?
Simplifying as much as possible, according to the theory, any analog signal can be reconstructed without errors, by taking samples at regular time intervals, if the sampling frequency is greater than or equal to twice the highest frequency occurring in the analog signal to be sampled.
Long and complicated sentence. In simple terms, if it is true that the audible bandwidth goes from 20Hz to 20kHz, for its error-free reconstruction it would be enough to sample at a frequency slightly higher than 40kHz; for convenience let's say 44.1kHz, as defined in the standard for audio CDs.
Unfortunately, however, assuming that the Nyquist-Shannon theorem is an easy and straightforward way to determine the minimum sampling rate for a system is a common misconception. The theorem sets some limits, yes, but at the same time it does not give easy answers. The main difficulty lies in the fact that the theorem is based on the notion that the signal to be sampled has a perfectly limited bandwidth, that is a pure sine wave at a given frequency. In reality, however, no signal has these characteristics.
What doesn't the Nyquist-Shannon theorem say?
The Nyquist-Shannon sampling theorem, with its limit on the sampling rate versus the spectral content of the signal, gives us some clearly stated limits, but as we see in practice, these limits are not as clear as they are in theory. So, Nyquist-Shannon's theory seems, on the surface, to be saying things that are not true in practice. What Nyquist-Shannon's sampling theorem - absolutely and positively - does not say, is that if we aspire to a reasonable chance of success, we cannot design a system to operate at the minimum rate it defines.
This might mean that no system that samples data from the real world can do it perfectly. It is also true, however, that while not achieving perfection, with a little ingenuity and work you can design systems good enough so that the advantages you gain from processing discrete signals far outweigh the disadvantages of sampling, making many digital systems superior to their analog equivalents.
Let's put aside for a moment the Nyquist-Shannon theorem to approach the other step, no less important, of the digitalization process, that is quantization. We now understand that the analog signal, which is continuous, is analyzed at regular intervals, determined by the sampling frequency. For each sample, the ADC detects the momentary amplitude of the signal, assigning it a numerical value among those available. How many of these values are available depends on the bit depth defined by the resolution - in audio, most commonly, 16 for so-called CD quality, which allows a little more than 65,000 values; or 24 for so-called Hi-Res quality, which allows a little more than 16,000,000 values, as show in the table below:
Why do we need so many values? This is a legitimate question, but giving a comprehensive answer is rather complicated. Let us emphasize two facts:
- The perception of sound is logarithmic, but not directly proportional with the binary numbering system. Let me explain, a 6dB increase in signal amplitude always corresponds to a doubling of the sound pressure (please note, not the volume, for that you need 10dB), no matter if we are just above the threshold of audibility or near the threshold of pain. In binary terms, a 6dB increase corresponds to the addition of a bit, which causes the granularity, i.e. the number of available intervals, to change radically when we go from lower to higher values.
- The binary system knows only integer numbers. Each operation to modify the signal, such as a simple change in volume, equalization, etc., introduces calculation errors, and therefore approximations, which are added to each intervention and alter the signal to be reconstructed.
Simplifying again, we can say that the limits of quantization, at the end of the chain, have a very big impact on sound quality.
And time came
Up to this point, I don't pretend to have raised much of a fuss among the disciples of digital, after all we have dealt with known and well-documented issues, but perhaps someone is beginning to reflect and wonder where we are aiming to go.
Let's go back to the Nyquist-Shannon theorem. In the previous paragraphs, to try to determine the most suitable sampling frequency for our purposes, we considered the spectrum of a signal. But there is another aspect, especially when reconstructing the spatiality of a sound, which is almost always neglected: the time domain. We all know that the audible bandwidth (please note, of a young, healthy human being) goes from 20Hz to 20kHz, but this does not help us understand where a sound comes from. It doesn't allow us to place it in space. How can we understand if a sound comes from the front or from behind, from the right or from the left, from above or from below, and moreover with an almost millimetric precision?
This was taken care of by a brilliant physics professor, passionate about acoustics, whom I had the pleasure of meeting a few years ago at an Audio Engineering Society (AES) conference, Dr. Milind N. Kunchur. In his studies on hearing sensitivity, he proved that humans can discern very small temporal alterations, as small as 5 microseconds (µs)! If we relate this value to the Nyquist-Shannon theorem (please note, a cycle duration of 5µs is equivalent to a frequency of 200kHz), in order to completely preserve the transparency of a sound we should not even consider sampling frequencies lower than 400kHz... and here a world collapses on us!
It was enough to add a point to all the notions we already had, to be able now to declare without a shadow of a doubt, albeit reluctantly, that "sound as we know it in nature cannot be captured and reproduced maintaining all its characteristics". Although this statement applies mainly to digital technology, where everything is fragmented, measured, and quantified, we must not delude ourselves, even the analog technology has great difficulties, so we have to face it.
Trying to prefer the glass half-full to the glass half-empty, we can ask ourselves what the purpose of sound recording and reproduction is. Is it really an attempt to replace nature? I don't think so, personally, I see it rather as a way of documenting history, events, then also developed as an art form, capable of stimulating our senses.
Never has the saying "what goes around comes around" been so true as in this context. After all, if we like what we hear and it triggers emotions in us, can't we say that the purpose of a sound recording has been achieved?
- What Nyquist Didn't Say, and What to Do About It - https://www.wescottdesign.com/articles/Sampling/sampling.pdf
- Kunchur's Research Group - http://boson.physics.sc.edu/~kunchur//
- 2L High Resolution Music, test bench - http://www.2l.no/hires/
- Musicdoor hifi Sagl - https://www.musicdoor.com/
The Swiss National Sound Archives sincerely thanks Brandon Pletsch for the "Auditory Transduction" animation presented in the first paragraph of this project.
The Swiss National Sound Archives is part of the Swiss National Library