《计算机应用研究》|Application Research of Computers

深度强化学习复原多目标航迹的TOC奖励函数

Design of TOC reward function in multi-target trajectory recovery with deep reinforcement learning

免费全文下载 (已被下载 次)  
获取PDF全文
作者 贺亮,徐正国,贾愚,沈超,李赟
机构 1.盲信号处理重点实验室,成都 610041;2.西安交通大学 智能网络与网络安全教育部重点实验室,西安 710049
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2020)06-004-1626-07
DOI 10.19734/j.issn.1001-3695.2018.12.0886
摘要 针对航迹探测领域中探测器获得的目标地理位置通常是同一帧下无法区分的多目标场景,需要利用目标位置信息还原各航迹并区分各目标的问题进行研究,提出采用深度强化学习复原目标航迹的方法。依据目标航迹的物理特点,提取数学模型,结合目标航迹的方向、曲率等提出轨迹曲率圆(TOC)奖励函数,使深度强化学习能够有效复原多目标航迹并区分各目标。首先描述多目标航迹复原问题,并将问题建模成深度强化学习能够处理的模型;结合TOC奖励函数对多目标航迹复原问题进行实验;最后给出该奖励函数的数学推导和物理解释。实验结果表明,TOC奖励函数驱动下的深度强化网络能够有效还原目标的航迹,在航向和航速方面切合实际目标航迹。
关键词 深度强化学习; 序贯决策; <;i>;Q<;/i>;函数; 轨迹密切圆
基金项目 国家自然科学基金重点项目(U1736205)
国家自然科学基金资助项目(61773310)
本文URL http://www.arocmag.com/article/01-2020-06-004.html
英文标题 Design of TOC reward function in multi-target trajectory recovery with deep reinforcement learning
作者英文名 He Liang, Xu Zhengguo, Jia Yu, Shen Chao, Li Yun
机构英文名 1.National Key Laboratory of Science & Technology on Blind Signal Processing,Chengdu 610041,China;2.MOE Key Laboratory for Intelligent Networks & Network Security,Xi'an Jiaotong University,Xi'an 710049,China
英文摘要 It attracts lots of attention in the field of object trajectory detection that detectors always receive several geographical locations without any other information about the targets, and furthermore it comes into a problem to use the geographical location information received by the sensors to reconstruct the trajectories of each target as well as to distinguish the targets in each frame, which is called multi-target trajectory recovery and can be solved by deep reinforcement learning(DRL). This paper implemented a trajectory osculating circle(TOC) reward function based on the mathematical model of the direction and trajectory curvature according to the peculiarity of trajectories in actual. Firstly, it switched the issue of the multi-target trajectory reconstruction into a model which could be appropriate for DRL. Then, it tested DRL with the proposed reward function. Finally, it introduced a mathematical derivation and physical interpretation of the proposed TOC reward function. The experimental result shows that DRL with the TOC reward function can reverse the trajectory effectively, and the trace corresponds well with the actual trajectory.
英文关键词 deep reinforcement learning(DRL); sequential decision; < i> Q< /i> function; trajectory osculating circle(TOC)
参考文献 查看稿件参考文献
 
收稿日期 2018/12/27
修回日期 2019/1/21
页码 1626-1632
中图分类号 TP391
文献标志码 A