VV DIGALAKIS, D RTISCHEV, LG NEUMEYER
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 3(5) 357-366 1995年9月 査読有り
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMM's), Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers, Performance degrades dramatically when the user is radically different from the training population, A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task, In continuous mixture-density HMM's the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates, To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities, The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English, For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers, For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data.