lpc encoding development
What is LPC?
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form,
using the information of a linear predictive model. It is one of the most powerful speech analysis techniques,
and one of the most useful methods for encoding good quality speech at a low bit rate and provides highly accurate estimates of speech parameters.
LPC is the most widely used method in speech coding and speech synthesis.
Making an approximation of the reality of speech production, the vocal tract is seen as a tube which is characterized by its resonances.
These resonances give rise to enhanced frequency bands called formants.
Well, LPC estimates exactly this formants, using them to create a filter that will be applied to a carrier for speech reconstruction.
Because speech signals vary with time, this process is done on short chunks (frames).
Why LPC?
Sending bytes within a video stream make it necessary to optimize che amount of information needed. As LPC encoding return the coefficient
of a filter of given order, it is more efficient than a simple audio compression.
To make it even more simple, an audio frame with size 160 byte or one with size 320 byte could be either coded into a n-size array of bytes, where n is the LPC order.
Although we have now ensured a good compression of information, a trade-off between filter order and frame size is needed for an
understandable speech decoding.
Python implementation
speaker
The audio stream is managed with the sounddevice library, which implement stream functions characterized by a programmable callback:
once the frame size (160 on this project) is chosen, LPC encoding is applied on each frame returning the related coefficients.
For every frame the autocorrelation is computed: the result is a Toeplitz matrix, from which the parameters are extracted using the
Levinson-Durbin recursion.
As the LPC take place during a live stream that last the time of a frame, avoiding delays is fundamentals to maintain the quality of speech
and the Levinson process turned to be quite heavy if elaborated on Python, the solution was to implement this particular function on #C++ and call it
through a dll.
After this process, for each frame the RMS of the of the original signal is calculated. This value will be send with the coefficients.
Listener
The coefficients received has to be applied to a carrier: for this project we found out that a sawtooth signal with 100Hz frequency
was the better one for a good quality of speech.
The system recreate a filter based on LPC coefficients and applied to a slice of carrier with size equal to the one used by speaker.
8-bit float type
In order to limit as much as possible the number of byte to be sent, we implement a possible float8-type.
The solution accept numbers in range [-10:10] with 1/63 step and it is specifically designed for this project (that is, the typical range for LPC coeff).
The first bit (MSB) define the sign.
If second bit is 0 number was >1 and it has been multiplied by 0.1
Bits from 3rd to 8th indicate x, with x/63 the nearest number to the original, considering changes applied by 1st and 2nd bits.