Cite this as:
Kwang Myung Jeon, Su Yeon Park, Chan Jun Chun, Nam In Park, and Hong Kook Kim, "Multi-band Approach to Deep Learning-Based Artificial Stereo Extension,"
ETRI Journal, vol. 39, no. 3, June. 2017, pp. 398-405.
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band-wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub-band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub-band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full-band processing.
Choose your export options:
File Format :
Citation Format :