TTS系统中基于双音素的基元选择方法

Diphone-based Unit Selection in Text-to-Speech Conversion for Mandarin

摘要: 为寻求能较好解决音节内和音节间的协同发音单元方案,提出了采用类似英文文语转换系统中使用的双音素作为合成单元方案,并根据普通话语音中只包含410个全音节特点,进一步完善了双音素在汉语中的应用。试验结果表明,该方案包含了连续语流中的所有过渡音征,使合成语音转接流畅、自然。

Abstract: Most of the mandarin text-to-speech systems are syllable-based which only include syllable-internal coarticulation while cross out any cross-syllable coarticulation.One solution to this problem is to abandon syllable-based models in favor of units which can model both syllable-internal and cross-syllable coarticulation.One such unit is the diphone which has been quite used in English TTS sysem.The concept of diphone can be improved in Chinese speech synthesis for there are only 410 syllables in the mandarin.It is shown that diphone-based models can model cross-syllable coarticulation and produce natural-sounding speech rather than syllable-based systems which just insert silence between syllables.