《计算机应用研究》|Application Research of Computers

基于增量式相似度的样本评估方法

Sample evaluation method based on incremental similarity

免费全文下载 (已被下载 次)  
获取PDF全文
作者 周瑞朋,秦进
机构 贵州大学 计算机科学与技术学院,贵阳 550025
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2021)07-008-1963-04
DOI 10.19734/j.issn.1001-3695.2020.11.0408
摘要 现有的强化学习算法存在样本利用率低的问题,导致智能体寻找最优策略的能力下降。为解决这个问题,提出了基于增量式相似度的样本评估方法。设计了一个状态新颖度度量方法和一个样本价值评价函数。计算新状态与基准状态之间的相似度,基于状态的相似度计算状态的新颖程度,再增量式更新基准状态,直到训练结束。计算样本价值时,将状态的新颖程度考虑在内,再针对样本奖励值是否大于零分别进行计算。最后根据其样本价值结合排名选择和随机选择进行采样。该方法在Playing Atari 2600的控制问题中取得了更高的奖励值,说明该方法缓解了样本利用率低的问题,且通过增量式计算相似度减少了计算量。
关键词 强化学习; 样本利用率; 样本评价; 增量式; 相似度
基金项目 国家自然科学基金资助项目(61562009)
贵州省科学技术基金资助项目(黔科合支撑[2020]3Y004号)
本文URL http://www.arocmag.com/article/01-2021-07-008.html
英文标题 Sample evaluation method based on incremental similarity
作者英文名 Zhou Ruipeng, Qin Jin
机构英文名 College of Computer Science & Technology,Guizhou University,Guiyang 550025,China
英文摘要 The existing reinforcement learning algorithms have low sample utilization, which leads to a decrease in the ability of the agent to find the optimal strategy. To solve this problem, this paper proposed a sample evaluation method based on incremental similarity. It designed a novel measure method of state and a sample value evaluation function. It calculated the similarity between the new state and the baseline state, and calculated the novelty based on the similarity of the state, and then updated the baseline state incrementally until the end of the training. When calculating the sample value, it took the novelty degree of the state into account, and then calculated the sample reward value separately according to whether the sample reward value was greater than zero. Finally, it conducted the sampling according to the sample value, ranking selection and random selection. This method achieved higher reward values in the control problems of Playing Atari 2600. It shows that this method alleviates the problem of low sample utilization and reduces the computational burden by incremental similarity calculation.
英文关键词 reinforcement learning; sample utilization; sample evaluation; incremental; similarity
参考文献 查看稿件参考文献
 
收稿日期 2020/11/21
修回日期 2021/1/6
页码 1963-1966
中图分类号 TP181
文献标志码 A