Wavelet Features For Speech Recognition.¶

I've been partipating in the TensorFlow speech recognition challenge.

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

It seems like the most common approach is to begin by turning the audio into a spectrogram and then feeding that into a 2D CNN. One trouble with spectrograms is that you have to trade off resolution in frequency for resolution in time and vice versa. In principle you can get higher resolution in time for higher frequencies than you can for lower frequencies but when you pick an input length for your short time fourier transform you lose temporal resolution much below the window length.

Wavelets are one possible way around this limitation

Asymptotic Labs (Posts about speech recognition)

3D CNN for audio data

3D Time/Frequency/Phase Representation of Audio for Speech Recognition.¶

Wavelet Spectrograms for Speech Recognition

Wavelet Features For Speech Recognition.¶