Speech signal processing involves the conversion of analog signals to digital code employing procedures such as sampling, windowing, and filtering. Matlab is used primarily for signal processing in the lab, since it deals well with mathematical functions. After speech waves are digitized, they can be calculated and manipulated to obtain meaningful parameters, including intensity, phoneme or syllable segmentation, voicing, and pitch detection. Some time-domain methods that estimate these parameters include the root mean square, correlation,autocorrelation, zero-crossing, and derivatives. In addition, spectral properties of a speech signal can be obtained using Fourier transform, which demonstrates how the signal’s power is distributed at varied frequencies. Such signal processing methods help us understand a wide range of underlying patterns of speech as it is presented in digital, mathematical, and graphical forms.

- Autocorrelation for pitch estimation – It demonstrates the correlation between a signal and shifted versions of itself. The highest peak in the function is at the zero lag when the signal has not shifted. The next highest peak is an estimate of fundamental frequency.
- Zero-crossing for pitch estimation – It only works well without any stochastic signal. The number of times a signal crosses zero is equal to twice the fundamental frequency.
- Derivatives for segmentation for syllables – The peak of the first derivative of RMS amplitude can indicate the beginning of the syllable. Also, zero of the second derivative corresponds to the beginning of the syllable.
- Fourier transform – It demonstrates distribution of acoustic energy and spectral properties of speech signals by decomposing a complex wave into a set of sinusoids. A longer window size leads to better spectral resolution, but a shorter window size leads to better temporal resolution.

Updated by Jieun Lee on 04/28/2016