《计算机应用研究》|Application Research of Computers

多智能体强化学习在城市交通网络信号控制方法中的应用综述

Multi-agent reinforcement learning based traffic signal control for integrated urban network: survey of state of art

免费全文下载 (已被下载 次)  
获取PDF全文
作者 杨文臣,张轮,Zhu Feng
机构 1.云南省交通规划设计研究院 陆地交通气象灾害防治技术国家工程实验室,昆明 650031;2.同济大学 道路与交通工程教育部重点实验室,上海 201804;3.南洋理工大学 土木与环境工程学院,新加坡 639798
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2018)06-1613-06
DOI 10.3969/j.issn.1001-3695.2018.06.003
摘要 交通信号控制系统在物理位置和控制逻辑上分散于动态变化的网络交通环境,将每个路口的交通信号控制器看做一个异质的智能体,非常适合采用无模型、自学习、数据驱动的多智能体强化学习(MARL)方法建模与描述。为了研究该方法的现状、存在问题及发展前景,系统跟踪了多智能体强化学习在国内外交通控制领域的具体应用,包括交通信号MARL控制概念模型、完全孤立的多智能体强化学习(MARL)的控制、部分状态合作的多智能体强化学习控制和动作联动的多智能体强化学习(MARL)控制,分析其技术特征和代际差异,讨论了多智体强化学习方法在交通信号控制中的研究动向,提出了发展网络交通信号多智能体强化学习集成控制的关键问题在于强化学习控制机理、联动协调性、交通状态特征抽取和多模式整合控制。
关键词 智能交通;交通控制;多智能体强化学习;闭环反馈;联动协调;数据驱动
基金项目 云南省交通厅科技计划资助项目(云交科2014(A)23)
国家“863”计划资助项目(2012AA112307)
本文URL http://www.arocmag.com/article/01-2018-06-003.html
英文标题 Multi-agent reinforcement learning based traffic signal control for integrated urban network: survey of state of art
作者英文名 Yang Wenchen, Zhang Lun, Zhu Feng
机构英文名 1.NationalEngineeringLaboratoryforSurfaceTransportationWeatherImpactsPrevention,BroadvisionEngineeringConsultants,Kunming650031,China;2.KeyLaboratoryofRoad&TrafficEngineeringforMinistryofEducation,TongjiUniversity,Shanghai201804,China;3.SchoolofCivil&EnvironmentalEngineering,NanyangTechnologicalUniversity,Singapore639798,Singapore
英文摘要 Urban traffic control (UTC) systems, geographical and logical distribution in dynamic changing traffic environments, are well suited for multi-agent reinforcement learning (MARL) approach because of their model free, self-learning, and data-dri-ven features.To investigate the state-of-the-art, this paper comprehensively surveyed main chanllenges and recent trends, the MARL methods and techniques applied to UTC systems, including general framework of MARL for UTC, totally independent MARL, partially state-cooperation MARL, and joint-action MARL.By comparing key characteristics and differences of the leading MARL approaches, it discussed several future directions toward the successful deployment of MARL technology in traffic control systems, and addressed four critical issues in developing agent-based traffic control systems for integrated network as mechanism of RL traffic signal control, joint-action coordination, feature partitioning of traffic state and multi-model integrated control.
英文关键词 intelligent transportation; traffic control; multi-agent reinforcement learning(MARL); closed feedback; joint-coordinated cooperation; data driven
参考文献 查看稿件参考文献
  [1] Hamilton A, Waterson B, Cherrett T, et al. The evolution of urban traffic control:changing policy and technology[J] . Transportation Planning and Technology, 2013, 36(1):24-43.
[2] Zhang Junping, Wang Feiyue, Wang Kunfeng, et al. Data-driven intelligent transportation systems:a survey[J] . IEEE Trans on Intelligent Transportation Systems, 2011, 12(4):1624-1639.
[3] Wu Xinkai, Liu H X. Using high-resolution event-based data for traffic modeling and control:an overview[J] . Transportation Research Part C:Emerging Technologies, 2014, 42(5):28-43.
[4] Thorpe T L. Vehicle traffic light control using SARSA[R] . Colorado:Colorado State University, 1997.
[5] Bazzan A L C. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control[J] . Autonomous Agents and Multi-Agent Systems, 2009, 18(3):342-375.
[6] 陆化普, 孙智源, 屈闻聪. 大数据及其在城市智能交通系统中的应用综述[J] . 交通运输系统工程与信息, 2015, 15(5):45-52.
[7] El-Tantawy S, Abdulhai B. Towards multi-agent reinforcement learning for integrated network of optimal traffic controllers (MARLIN-OTC)[J] . Transportation Letters, 2010, 2(2):89-110.
[8] Mannion P, Duggan J, Howley E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control[M] //Autonomic Road Transport Support Systems. Berlin:Springer, 2016:47-66.
[9] Abdulhai B, Karakoulas G J, Pringle R. Reinforcement learning for true adaptive traffic signal control[J] . Journal of Transportation Engineering, 2003, 129(3):278-285.
[10] El-Tantawy S, Abdulhai B, Abdelgawad H. Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers(MARLIN-ATSC):methodology and large-scale application on downtown toronto[J] . IEEE Trans on Intelligent Transportation Systems, 2013, 14(3):1140-1150.
[11] Abdulhai B, Pringle P. Autonomous multiagent reinforcement learning-5GC urban traffic control[C] //Proc of the 82nd Annual Meeting of Transportation Research Board. 2003.
[12] Wiering M A, Van Otterlo M. Reinforcement learning:state-of-the-art[M] . New York:Springer-Verlag, 2012.
[13] El-Tantawy S, Abdulhai B. Comprehensive analysis of reinforcement learning methods and parameters for adaptive traffic signal control[C] // Proc of the 90th Transportation Research Board Annual Meeting. 2011.
[14] Jin Junchen, Ma Xiaoliang. Adaptive group-based signal control by reinforcement learning[J] . Transportation Research Procedia, 2015, 10(7):207-216.
[15] 马寿峰, 李英, 刘豹. 一种基于agent的单路口交通信号学习控制方法[J] . 系统工程学报, 2002, 17(6):526-530.
[16] 夏新海. 面向城市自适应交通信号控制的强化学习方法研究[D] . 广州:华南理工大学, 2013.
[17] Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multi-agent reinforcement learning[J] . IEEE Trans on Systems, Man, and Cybernetics Part C:Applications and Reviews, 2008, 38(2):156-172.
[18] Salkham A, Cunningham R, Garg A, et al. A collaborative reinforcement learning approach to urban traffic control optimization[C] //Proc of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 2008:560-566.
[19] Balaji P G, German X, Srinivasan D. Urban traffic signal control using reinforcement learning agents[J] . IET Intelligent Transport Systems, 2010, 4(3):177-188.
[20] Wiering M. Multi-agent reinforcement learning for traffic light control[C] // Proc of the 17th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann, 2000:1151-1158.
[21] Oliveira D, Bazzan A L C, Da Silva B C, et al. Reinforcement learning based control of traffic lights in non-stationary environments:a case study in a microscopic simulator[C] //Proc of the 4th European Workshop on Multi-Agent Systems. 2006.
[22] Duan Houli, Li Zhiheng, Zhang Yi. Multiobjective reinforcement learning for traffic signal control using vehicular Ad hoc network[J/OL] . EURASIP Journal on Advances in Signal Processing, 2010. http://doi. org/10. 1155/2010/724035.
[23] Brys T, Pham T T, Taylor M E. Distributed learning and multi-objectivity in traffic light control[J] . Connection Science, 2014, 26(1):65-83.
[24] Khami M A, Gomaa W. Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework[J] . Engineering Applications of Artificial Intelligence, 2014, 29(3):134-151.
[25] 赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述[J] . 自动化学报, 2009, 35(6):676-681.
[26] Arel I, Liu C, Urbanik T, et al. Reinforcement learning-based multiagent system for network traffic signal control[J] . IET Intelligent Transport Systems, 2010, 4(2):128-135.
[27] Prashanth L, Bhatnagar S. Reinforcement learning with function approximation for traffic signal control[J] . IEEE Trans on Intelligent Transportation Systems, 2011, 12(2):412-421.
[28] Richter S, Aberdeen D, Yu Jin. Natural actor-critic for road traffic optimisation[C] //Advances in Neural Information Processing Systems. Piscataway, NJ:IEEE Press, 2006:1169-1176.
[29] Aziz H M, Zhu Feng, Ukkusuri S V. Reinforcement learning-based signal control using R-Markov average reward technique accounting for neighborhood congestion information sharing[C] //Proc of the 92nd Annual Meeting Transportation Research Board. 2013.
[30] Prabuchandran K J, Bhatnagar S. Decentralized learning for traffic signal control[C] //Proc of the 7th International Conference on Communication Systems and Networks. Piscataway, NJ:IEEE Press, 2015:1-6.
[31] Bazzan A L C, De Oliveira D, Da Silva B C. Learning in groups of traffic signals[J] . Engineering Applications of Artificial Intelligence, 2010, 23(4):560-568.
[32] Kuyer L, Whiteson S, Bakker B, et al. Multiagent reinforcement learning for urban traffic control using coordination graphs[M] //Machine Learning and Knowledge Discovery in Databases. Berlin:Springer, 2008:656-671.
[33] Zhu Feng, Aziz H M A, Qian Xinwu, et al. A junction-tree based learning algorithm to optimize network wide traffic control:a coordinated multi-agent framework[J] . Transportation Research Part C:Emerging Technologies, 2015, 58(9):487-501.
[34] Medina J C, Benekohal R F. Corridor-based coordination of learning agents for traffic signal control by enhancing max-plus algorithm[C] //Proc of the 93rd Transportation Research Board Annual Meeting. 2014.
[35] Abdoos, M, Mozayani, N, Bazzan, A. Hierarchical control of traffic signals using Q-learning with tile coding[J] . Applied Intelligence, 2014, 40(2):201-213.
[36] 何兆成, 佘锡伟, 杨文臣, 等. 结合Q学习和模糊逻辑的单路口交通信号自学习控制方法[J] . 计算机应用研究, 2011, 28(1):199-202.
[37] 龙琼, 胡列格, 张谨帆, 等. 考虑交通管理策略的交叉口信号控制多目标优化[J] . 中南大学学报:自然科学版, 2014, 45(7):2503-2508.
[38] 伦立宝. 基于强化学习的城市交通信号控制方法研究[D] . 西安:西安电子科技大学, 2013.
[39] 聂建强, 徐大林. 基于模糊Q学习的分布式自适应交通信号控制[J] . 计算机技术与发展, 2013, 23(3):171-174.
[40] 陈学松, 杨宜民. 强化学习研究综述[J] . 计算机应用研究, 2010, 27(8):2834-2838, 2844.
[41] Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation[J] . Journal of Machine Learning Research, 2006, 7(1):1789-1828.
[42] Yagan D, Tham C K. Coordinated reinforcement learning for decentralized optimal control[C] //Proc of IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Piscataway, NJ:IEEE Press, 2007:296-302.
[43] Weinberg M, Rosenschein J S. Best-response multiagent learning in non-stationary environments[C] // Proc of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems. New York:IEEE Computer Society, 2004:506-513.
[44] Ono N, Fukumoto K. Multi-agent reinforcement learning:a modular approach[C] //Proc of the 2nd International Conference on MultiAgent Systems. 1996:252-258.
[45] Jeffrey G. Reinforcement learning for adaptive traffic signal control[EB/OL] . [2016-12-25] . http://cs229. stanford. edu/proj2015/369_report. pdf.
[46] Abdelgawad H, Abdulhai B, El-Tantawy S, et al. Assessment of self-learning adaptive traffic signal control on congested urban areas:independent versus coordinated perspectives[J] . Canadian Journal of Civil Engineering, 2015, 42(6):353-366.
收稿日期 2017/6/10
修回日期 2017/7/24
页码 1613-1618
中图分类号 TP181;U491.51
文献标志码 A