Skip to main content
CDI Training

Signal processing

Signal vs. Noise

Perhaps the most important skill in signal processing[1] is the ability to separate important data (signal) from irrelevant data (noise). The human brain is naturally pretty good at this, which is whi you can sitll camprehend tihs sntence. Computers, on the other hand, are naturally very bad at it, which is why a single missing semicolon can crash an entire server. This has become a major problem, since modern science depends so heavily on computers, and has brought about some really creative techniques[2] to handle noisy data. Even with these clever algorithms, however, less noise is always better than more noise.

But what does it mean for data to have "less" noise? A strong wind can make it hard to hear someone whispering, but has almost no effect on a loud concert. The noise hasn’t changed, yet the first scenario seems much "noisier" than the second. This is because it’s only useful to quantify noise relative to the signal being measured. In other words, the important quantity isn’t the noise level, but rather the signal-to-noise ratio, defined as
$$ R=\frac{I_\mathrm{signal}}{I_\mathrm{noise}} $$
where $I$ is average intensity (or power, in the case of an time-dependent signal). Unless you’re a masochist, you want $R$ to be as large as possible. Whenever you’re trying to clean up your signal by boosting or filtering, make sure to think about this ratio. Otherwise, you might be changing the signal and the noise in proportional amounts, and not actually improving your ratio.

Nyquist Sampling and Aliasing

On paper, we treat variables as continuous, meaning they can change by arbitrarily small amounts. When we go to take data, however, we end up with a long list of discrete points, and no information about the spaces in between. The sample rate is defined as one over the spacing between data points, and it determines certain things about what can and can’t be measured.[3]

The most important constraint is known as the Nyquist frequency, folding frequency, or Nyquist limit. This is the highest frequency that can be faithfully measured at a given sample rate. The incredibly complicated equation used to find the Nyquist frequency is...

$$ f_N=f_s/2 $$

where $f_N$ is the Nyquist frequency, and $f_s$ is the sampling frequency. In other words, you can’t accurately measure any signal without at least two data points per oscillation.

https://upload.wikimedia.org/wikipedia/commons/2/28/AliasingSines.svg

An example of signal aliasing. The red wave is correctly sampled at each point, but the data points are too widely spaced to create a faithful reproduction. Instead, the signal is aliased to the blue wave.

Any signal above the Nyquist frequency will be aliased to a lower frequency. You must sample at least twice as fast as the highest frequency component. Otherwise, non-periodic features smaller than the spacing between data points may not be captured at all. The practice of picking a sample rate based on the expected signal frequency is called Nyquist sampling.

Oversampling

Notes

  1.  This is also an important skill in life.
  2.  If you’re interested, check out these videos on Hamming codes or lock-in detection, or browse the Wikipedia pages on noise reduction and related topics.
  3.  Fun fact: This is why many purists still prefer film and vinyl over digital formats. The physical processes of exposing film or engraving vinyl are continuous, and therefore avoid some of the issues caused by discrete data collection.