Diphone-based Unit Selection in Text-to-Speech Conversion for Mandarin
-
Graphical Abstract
-
Abstract
Most of the mandarin text-to-speech systems are syllable-based which only include syllable-internal coarticulation while cross out any cross-syllable coarticulation.One solution to this problem is to abandon syllable-based models in favor of units which can model both syllable-internal and cross-syllable coarticulation.One such unit is the diphone which has been quite used in English TTS sysem.The concept of diphone can be improved in Chinese speech synthesis for there are only 410 syllables in the mandarin.It is shown that diphone-based models can model cross-syllable coarticulation and produce natural-sounding speech rather than syllable-based systems which just insert silence between syllables.
-
-