《计算机应用研究》|Application Research of Computers

基于蚂蚁优化算法的分层强化学习

Hierarchical reinforcement learning based on ant colony optimization algorithm

免费全文下载 (已被下载 次)  
获取PDF全文
作者 周晓柯,孙志毅,彭志平
机构 1.太原科技大学 电子信息工程学院,太原 030000;2.广东石油化工学院 计算机科学与电子信息系,广东 茂名 525000
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2014)11-3214-03
DOI 10.3969/j.issn.1001-3695.2014.11.003
摘要 自主系统中,agent通过与环境交互来执行分配给他们的任务,采用分层强化学习技术有助于agent在大型、复杂的环境中提高学习效率。提出一种新方法,利用蚂蚁系统优化算法来识别分层边界发现子目标状态,蚂蚁遍历过程中留下信息素,利用信息素的变化率定义了粗糙度,用粗糙度界定子目标;agent使用发现的子目标创建抽象,能够更有效地探索。在出租车环境下验证算法的性能,实验结果表明该方法可以显著提高agent的学习效率。
关键词 蚂蚁系统优化算法;强化学习;Option;瓶颈边
基金项目 国家自然科学基金资助项目(61272382)
本文URL http://www.arocmag.com/article/01-2014-11-003.html
英文标题 Hierarchical reinforcement learning based on ant colony optimization algorithm
作者英文名 ZHOU Xiao-ke, SUN Zhi-yi, PENG Zhi-ping
机构英文名 1. School of Information & Electrical Engineering, Taiyuan University of Science & Technology, Taiyuan 030000, China; 2. Dept. of Computer Science & Technology, Guangdong University of Petrochemical Technology, Maoming Guangdong 525000, China
英文摘要 Agent interacts with the environment to perform their assigned tasks in autonomous systems. Using hierarchical reinforcement learning technology helps the agent to improve learning efficiency in the large and complex environment. This paper put forward a new method to find subgoal. It used the rate of change of pheromone which ants leaved in ergodic process to define the roughness, and used the roughness to define the sub-goals. It used the found subgoals to create abstract agent in order to explore more effective. The experimental results show that this method can significantly improve the learning performance. Authentication algorithm in a taxi environmental performance, experimental results show that this method can significantly improve the learning efficiency of agent.
英文关键词 ant colony optimization; reinforcement learning; Option; bottleneck edges
参考文献 查看稿件参考文献
  [1] DIETTERICH T. Hierarchical reinforcement learning with the MAXQ value function decomposition[J] . Artificial Intelligence Research, 2000, 13(1):227-303.
[2] 胡祥培, 丁秋雷, 李永先. 蚁群算法研究评述[J] . 管理工程学报, 2008, 2(2):74-79.
[3] SUTTON R, PRECUP D, SINGH S. Between MDPS and semi-MDPS:a framework for temporal abstraction in reinforcement learning[J] . Artificial Intelligence, 1999, 112(1):181-211.
[4] 沈晶. 分层强化学习方法研究[D] . 哈尔滨:哈尔滨工程大学, 2006.
[5] 谭营. 计算群体智能基础[M] . 北京:清华大学出版社, 2009:253-259.
[6] DAVOODABADI M, BEIGY H. A new method for discovering subgoals and constructing options in reinforcement learning[C] //Proc ofIndian International Conference on Artificial Intelligence. 2011:441-450.
[7] STOLLE M, PRECUP D. Learning options in reinforcement learning[C] //Proc of the 5th International Symposium on Abstraction, Reformulation and Approximation. Palo Alto:AAAI Press, 2002.
[8] McGOVERN A. Autonomous discovery of temporal abstract from interaction with an environment[D] . Massachusetts:University of Massachusetts, 2002.
[9] IMSEK O, BAETO A G. Using relative novelty to identify useful temporal abstractions in reinforcement learning[C] //Proc of the 21st International Conference on Machine Learning. New York:ACM Press, 2004:95.
[10] DIGNEY B. Learning hierarchical control structures for multiple tasks and changing environments[C] //Proc of the 5th International Conference on the Simulation of Adaptive Behavior. 1998:321-330.
[11] MANNOR S, MENACHE I, HOZE A, et al. Dynamic abstraction in reinforcement learning via clustering[C] //Proc of the 21st International Conference on Machine Learning. New York:ACM Press, 2004:71-78.
[12] KAZEMITABAR S J, BEIGH H. Automatic discovery of subgoals in reinforcement learning using strongly connected components[C] //Proc of the 15th International Conference on Advances in Neuro Information Processing. Berlin:Springer-Verlag, 2008:829-834.
[13] TAGHIZADEH N. Autonomous skill acquisition in reinforcement learning based on graph clustering[D] . Tehran:Sharif University of Technology, 2011.
[14] DRIES E J, PETERSON G L. Scaling ant colony optimization with hierarchical reinforcement learning partitioning[C] //Proc of the 10th Annual Conference on Genetic and Evolutionary Computation. New York:ACM Press, 2008:25-32.
[15] 李立云. 基于Option自动生成的分层强化学习方法研究[D] . 长沙:长沙理工大学, 2008.
收稿日期 2013/11/11
修回日期 2013/12/28
页码 3214-3216,3220
中图分类号 TP181
文献标志码 A