《计算机应用研究》|Application Research of Computers

多模态深度学习综述

Survey of multimodal deep learning

免费全文下载 (已被下载 次)  
获取PDF全文
作者 刘建伟,丁熙浩,罗雄麟
机构 中国石油大学(北京)自动化系,北京 102249
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)06-001-1601-14
DOI 10.19734/j.issn.1001-3695.2018.12.0857
摘要 在多模态深度学习发展前期总结当前多模态深度学习,发现在不同多模态组合和学习目标下,多模态深度学习实现过程中的共有问题,并对共有问题进行分类,叙述解决各类问题的方法。具体来说,从涉及自然语言、视觉、听觉的多模态学习中考虑了语言翻译、事件探测、信息描述、情绪识别、声音识别和合成以及多媒体检索等方面研究,将多模态深度学习实现过程中的共有问题分为模态表示、模态传译、模态融合和模态对齐四类,并对各类问题进行子分类和论述,同时列举了为解决各类问题产生的神经网络模型。最后论述了实际多模态系统、多模态深度学习研究中常用的数据集和评判标准,并展望了多模态深度学习的发展趋势。
关键词 多模态; 深度学习; 多神经网络; 多模态表示; 多模态传译; 多模态融合; 多模态对齐
基金项目
本文URL http://www.arocmag.com/article/01-2020-06-001.html
英文标题 Survey of multimodal deep learning
作者英文名 Liu Jianwei, Ding Xihao, Luo Xionglin
机构英文名 Dept. of Automation,China University of Petroleum,Beijing 102249,China
英文摘要 This paper aimed to summarize the current multimodal deep learning, found common problems in the implementation of multimodal deep learning under different multimodal and learning objectives, as well as made common problems classify and described methods for solving various problems at the early development of multimodal deep learning. Specifically, this paper summarized the current multimodal deep learning that studied on natural language, visual, auditory, and considered the research direction such as language translation, event detection, information description, emotion recognition, voice recognition and synthesis, and multimedia retrieval and so on, which further concluded that there were four types of common problems: multimodal representation, multimodal interpretation, multimodal fusion and multimodal alignment. Meanwhile, this paper sub-categorized and discussed each common multimodal learning problem, and listed the neural network models generated for solving the problems. Finally, it introduced some actual multimodal system, listed baseline datasets and evaluation criteria used in multimodal deep learning, and prospected the development directions for future research.
英文关键词 multimodal; deep learning; neural network; multimodal representation; multimodal interpretation; multimodal fusion; multimodal alignment
参考文献 查看稿件参考文献
 
收稿日期 2018/12/11
修回日期 2019/1/23
页码 1601-1614
中图分类号 TP181
文献标志码 A