ABSTRACT:
One-tube models and their corresponding all-pole representation have a long history in speech modeling, in particular for non-nasal vowels. To model the spectral components of nasal speech signals a minimum of two connected tubes is necessary. The transfer function of such a branched-tube model has a pole-zero representation but the estimation of the tube model is more difficult than the estimation of a general pole-zero model. On the other hand, the use of a physical model may offer constraints in the time domain that are not available when using the pole-zero model. Here, a variational Bayesian scheme under Gaussian assumptions will be presented to estimate the tube areas directly from the log-spectrum of the speech signal. Probabilistic priors are used to enforce smoothness of the tubes.