Audio MP in Video Conference
-
Graphical Abstract
-
Abstract
ITU-T H.323 describes the components for multimedia communication systems in those situations where the underlying transport is a packet-based network.The multipoint control unit(MCU) can provide centralized processing of audio,video,and/or data stream in a multipoint conference.MCU is composed of the multipoint processor(MP) and the multipoint controller(MC).MP takes responsibilities of collecting audio,video,and/or data streams from all the terminals of the multipoint conference,processing all the information in the streams,and sending the processed data to the appointed terminals under the control of MC.In this paper,the authors bring forward some solutions for the request of processing audio stream,and then particularly present a practical policy aiming at the audio signals mixing operation.In the centralized multipoint conference mode,it is necessary to do the audio mixing operation on the speech from all the audio channels.The basic audio mixing technology includes three steps.First,MCU decodes the audio code streams from every audio channel respectively,and gets the sum of all the decoded speech.Second,the target speech corresponding to every terminal is gained after subtracting the source signal from the sum.Lastly,the target speech of every terminal is coded respectively,and transmitted to the specific terminal.So each of the terminals receives the audio signal containing all the signal of other terminals.There are many shortcomings in the method above.First,the more the terminals accessing the videoconference are,the more number of speech Codec used by MCU consequently is.Thus the calculating burden of MCU becomes heavy.Second,it is not necessary to mix all the speech from every audio channel equally.It is difficult for the perceptual ability to distinguish the useful information when the speech signals taken into the audio-mixer are more than 4 channels.Therefore,we design an improved audio-mixer that employs a kind of competitive mechanism.When the number of terminals accessing MCU is more than 4,we select 4 channels with the higher speech energy within fixed time interval and take them into the audio-mixer.The speech signals of other channels are regarded as the background noise after a certain of attenuation.The audio-mixer calculates the energy of speech in a fixed time interval and decides the state of every channel according to their speech energy.The state of every channel is preserved until to the end of the following time interval.
-
-