Advanced speech and audio technology is nowadays ubiquitously available, with most citizens in western societies making daily use of ipods and mobile phones, making free internet phone calls, or watching streaming movies, likely without paying much attention to the technological marvel these applications represent. These applications have been enabled by the vast progress that has been made in speech and audio coding and compression during the recent decades, a progress that is still ongoing.
In particular, effort is often focused on further improving the sound quality of the compressed, and sometimes corrupted, audio signal. For instance, perceptual audio coding aims at minimizing the perceived distortion at a given bit rate; in parametric audio coding, this is done by capturing most of the signal energy in a few well chosen model parameters. Given the variability of audio signals, it is critical that the signal models can represent a wide range of different audio signals; especially transient signals have proven to be troublesome in that respect over the years. In recent work, we strived to address this problem by suggesting an amplitude modulated sinusoidal signal decomposition, allowing for an improved compression of transient signals while maintaining the high audio quality of non-transient signals.
In other studies, we have examined how to accurately estimate the fundamental frequency, or pitch, of an audio signal. The fundamental frequency is an integral part in many audio-processing applications; for instance, long-term prediction in linear prediction-based speech coding requires accurate information about the pitch period. Similarly, parametric coding of speech and audio using a harmonic sinusoidal model is typically based on a pitch estimator. One difficulty in determining the fundamental frequency is that the number of harmonically related components is typically unknown, requiring the joint estimation of the model order and the pitch. Another problem is that for most audio signals, multiple harmonic signals are present and therefore needs to be estimated jointly. Recently, we have also started to examine uncertainties in the harmonic pitch structure, allowing for a further improved modelling of the audio signal.
This project is a close collaboration with Dr Mads Græsbøll Christensen and Prof. Søren Holdt Jensen at Aalborg University. We have also published a book on Multi-pitch estimation.