Nowadays, most of the headphones provide an adaptive control in order to reduce/delete external noise. In this section we are going to list and summarize the algorithms that have been developed by our team and that can be found in the GitHub directory. In particular, we have considered the most commercialized methods in order to think also in terms of market. In all this simulations we have negleted the constrain to keep the voice in the output to keep the focus on the algorithm.
LMS stands for Least Mean Squares and consists in an adaptive filter that minimizes the least squared error through some iterations.
The stereo audio of a gunshot superposed to a voice is reported in orange and blue while in green you can see the behaviour of the error taking as parameters L= 50, µ =0.01.
Referring to the figure you can notice at first sight that you have some initial oscillations before the convergence, while in the final part the compensation is very fast.
To increase the convergence time without falling in instability a possible solution is to employ the NLMS algorithm that updates in real time the µ value depending on the signal power in a given time window.
As a result, you can notice from the picture below that the green signal tends to zero in a shorter time and with less oscillations.
From the simulations one may think that adaptive filter could work for impulsive noise because it converges in an acceptable number of iterations but in reality, you should also consider that:
- The computed error is different from the real one and so the anti-noise function is not exact
- There are secondary paths caused by the external environment that modify the acquired signal or the transmitted anti-noise
- Microphones and speakers are non-ideal too and also the displacement of the hardware plays a critical role
- You always have a trade off between the filter order / time(L) and convergence time/instability(µ)
So, if you wish to obtain much accurate results you have to consider also those secondary paths mentioned and develop the so called FXLMS and FXNLMS algorithms. Through NLMS you compute offline the secondary path between anti-noise and error microphone(S) using white noise (that is flat at all frequencies).
In this simulation we taken as an input an audio consisting of a voice covered by impulsive noise. From the first two plots it’s possible to see the position and the amplitude of the peaks. In particular the magnitude of impulsive noise is much larger with respect to the voice level and for this reason it’s reasonable to think about a problem solvable studying a proper threshold.
We have created a slow filter to follow the mean volume of the sound and a fast filter used to start the triggering mechanism. When the signal exiting the fast filter is larger than the one exiting the slow filter of a proper quantity that can be set by the user, then this means that there is some peaked noise to remove.
In other words, when a shot is recognized we create a sort of hole of a predicted last and we fill this gap with some white noise in order to save the hearing of a person.
The result that can be seen in the final plot is that the noise is largely reduced and as a matter of fact the magnitude is tiny with respect to the initial one.
It’s possible to see from the figure that in the first part the compression algorithm is switched off and listening to the audio you can notice that in the first seconds it’s difficult to hear the voice. When we turn on the compression we have two main advantages: the first is that we have the attenuation of all the components that overcome a certain threshold. On the other hand we also have the amplification of weak sounds and consequently the voice will be more hearable. Unfortunately with this method we cannot remove the gunshot noise that will still persist.
By interpolating the aim is to reconstruct the missing data caused by a 50ms lasting clipping and to do so we use the two methods already described in the homepage: cubic polynomial interpolation and autoregressive model (ARIMA), respectively.
In addiction to them, it’s worth saying that we have also tried other interpolation techniques such as “linear”, “makima”, “spline” and “nearest” that by the way had not so efficient results and so we didn’t report them. If you wish to have a look at them, you can find everything in the GitHub’s repository.
In the cubic interpolation we can still hear the hole left from the clipping algorithm.
The best results are reached with the autoregressive model aka ARIMA. One project parameter is the number of samples L = 400 used before and after the shot to reconstruct the missing data inside the gap. The results coming from this simulation are very satisfying.
The main problem with this type of solution is that we get desirable results offline, but as we start running real time applications, the computation’s time can last minutes. Moreover, both algorithms have been applied on a clipping signal with holes lasting 50ms. We got worse results as we started using a software written in Python where the last of the clipping was even longer because of the dynamic triggering levels. Due to this fact, the reconstruction of the missing data was less efficient. At the same time, it’s worth mentioning that we have used an audio file with a 22 kHz sampling frequency and so we could get to better results changing some simulation’s parameters and employing a Fs = 8 kHz.
Like we can appreciate, the sound seems cut in a smoother way but the bill to pay is that the sound is a little bit more distorted.