《计算机应用研究》|Application Research of Computers

基于区域自适应多尺度卷积的单声道语音增强算法

Monaural speech enhancement algorithm based on region-aware multi-scale convolution

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王钇翔,吕忆蓝,台文鑫,孙建强,蓝天
机构 电子科技大学 信息与软件工程学院,成都 610054
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)11-010-3264-04
DOI 10.19734/j.issn.1001-3695.2021.03.0131
摘要 卷积神经网络的感受野大小与卷积核的尺寸相关,传统的卷积采用了固定大小的卷积核,限制了网络模型的特征感知能力;此外,卷积神经网络使用参数共享机制,对空间区域中所有的样本点采用了相同的特征提取方式,然而带噪频谱图噪声信号与干净语音信号的分布存在差异,特别是在复杂噪声环境下,使得传统卷积方式难以实现高质量的语音信号特征提取和过滤。为了解决上述问题,提出了多尺度区域自适应卷积模块,利用多尺度信息提升模型的特征感知能力;根据对应采样点的特征值自适应地分配区域卷积权重,实现区域自适应卷积,提升模型过滤噪声的能力。在TIMIT公开数据集上的实验表明,提出的算法在语音质量和可懂度的评价指标上取得了更优的实验结果。
关键词 语音增强; 卷积神经网络; 多尺度卷积; 区域自适应
基金项目 国家自然科学基金资助项目(U19B2028,61772117)
科技委创新特区资助项目(19-163-21-TS-001-042-01)
提升政府治理能力大数据应用技术国家工程实验室重点项目(10-2018039)
中央高校基本科研业务费资助项目(ZYGX2019J077)
本文URL http://www.arocmag.com/article/01-2021-11-010.html
英文标题 Monaural speech enhancement algorithm based on region-aware multi-scale convolution
作者英文名 Wang Yixiang, Lyu Yilan, Tai Wenxin, Sun Jianqiang, Lan Tian
机构英文名 School of Information & Software Engineering,University of Electronic Science & Technology of China,Chengdu 610054,China
英文摘要 The size of the receptive field of the convolutional neural network is related to the size of the convolution kernel. And the traditional convolution uses a fixed-size convolution kernel, which limits the feature perception ability of the network model. In addition, due to the parameter sharing mechanism of the convolutional neural network, it used the same feature extraction method for all pixels in the spatial region. However, there are differences in the distribution of noise signals and clean speech signals in the noisy spectrogram, especially in the complex noise environment, the general convolution method is difficult to achieve high-quality speech signal feature extraction and choosing. In order to solve the above problems, this paper proposed a multi-scale region adaptive convolution module, which used multi-scale information to improve the feature perception ability of the model and automatically allocated the area adaptive convolution achieve and improved the denoising ability of the model. The experiments on the TIMIT public datasets show that the proposed algorithm has achieved satisfactory results in the metrics of speech quality and intelligibility.
英文关键词 speech enhancement; convolutional neural network; multi-scale convolution; region-aware
参考文献 查看稿件参考文献
 
收稿日期 2021/3/5
修回日期 2021/4/30
页码 3264-3267
中图分类号 TP391.42
文献标志码 A