《计算机应用研究》|Application Research of Computers

一种优化的Kafka消费者/客户端负载均衡算法

Optimized load balancing algorithm for Kafka consumer/client

免费全文下载 (已被下载 次)  
获取PDF全文
作者 王郑合,王锋,邓辉,柳翠寅,张晓丽
机构 昆明理工大学 云南省计算机技术应用重点实验室,昆明 650500
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)08-2306-04
DOI 10.3969/j.issn.1001-3695.2017.08.015
摘要 Kafka是一款高吞吐的分布式消息系统,在海量数据处理等多个领域被广泛使用。但现有Kafka消费者/客户端负载均衡算法存在一些不足,在处理海量数据时易产生过高开销甚至出错等问题。提出了一种优化的Kafka消费者/客户端负载均衡算法,负载均衡过程完全由作为管理者的消费者控制,其余消费者不必单独进行负载均衡,而且管理者不用重新分配每个消费者消费的分区。系统对所有消费者的运行状况进行监控,出现宕机的消费者后可及时进行负载再均衡。测试结果说明该算法能够减少Kafka消费者/客户端在负载均衡过程中的系统开销,并避免出现错误的负载均衡结果,可以有效地保障分布科学数据处理的正确性。
关键词 分布式消息系统;发布订阅模式;Kafka;负载均衡;系统开销;可靠性
基金项目 国家基金天文联合基金重点项目(U1231205)
云南省应用基础基金重点项目(2014FA013,2013FA032)
本文URL http://www.arocmag.com/article/01-2017-08-015.html
英文标题 Optimized load balancing algorithm for Kafka consumer/client
作者英文名 Wang Zhenghe, Wang Feng, Deng Hui, Liu Cuiyin, Zhang Xiaoli
机构英文名 YunnanProvinceKeyLaboratoryofComputerTechnologyApplications,KunmingUniversityofScience&Technology,Kunming650500,China
英文摘要 Kafka is a high-throughput distributed messaging system. It is widely used in the distributed massive data proces-sing and other fields. But the current load balancing algorithm for the Kafka consumer client has some shortcomings. It will increase the overhead and even make some mistakes in the massive data processing. This paper developed an optimized load balancing algorithm for the Kafka consumer client. The process of load balancing was totally controlled by the consumer which was the manager. The rest of consumers didn’t need to do load balancing alone. And the manager didn’t have to redistribute partitions among consumers. The system monitored the condition of every consumer and did load balancing in time if there was a crashed consumer. The results of tests show the algorithm can reduce the system overhead caused by load balancing and avoid getting wrong results of load balancing. It can guarantee the correctness of distributed scientific data processing effectively.
英文关键词 distributed messaging system; publish-subscribe model; Kafka; load balancing; system overhead; reliability
参考文献 查看稿件参考文献
  [1] Maddineni S, Kim J, El-Khamra Y, et al. Distributed application run-time environment (DARE):a standards-based middleware framework for science-gateways[J] . Journal of Grid Computing, 2012, 10(4):647-664.
[2] Bellavista P, Corradi A, Reale A. Quality of service in wide scale publish/subscribe systems[J] . Communications Surveys and Tutorials, 2014, 16(3):1591-1616.
[3] Ahuja S P, Mupparaju N. Performance evaluation and comparison of distributed messaging using message oriented middleware[J] . Computer and Information Science, 2014, 7(4):9-16.
[4] Richter S, Quiané-Ruiz J A, Schuh S, et al. Towards zero-overhead static and adaptive indexing in Hadoop[J] . The VLDB Journal, 2014, 23(3):469-473.
[5] Ranjan R. Streaming big data processing in datacenter clouds[J] . Cloud Computing, 2014, 1(1):78-83.
[6] Ranjan R. Modeling and simulation in performance optimization of big data processing frameworks[J] . Cloud Computing, 2014, 1(4):14-19.
[7] Doulkeridis C, Nrvg K. A survey of large-scale analytical query processing in MapReduce[J] . The VLDB Journal, 2014, 23(3):355-380.
[8] 袁佳, 郭燕慧. 基于Rabbitmq的海量日志的分布式处理[J] . 软件, 2013, 34(7):19-23.
[9] Okorafor E, Patrick M K. Availability of JobTracker machine in Hadoop/MapReduce zookeeper coordinated clusters[J] . Advanced Computing:an International Journal, 2012, 3(3):19-30.
[10] 鲁阳, 郑岩. 利用Zookeeper对HDFS中Namenode单点失败的改进方法[J] . 软件, 2012, 33(12):192-196.
[11] 徐海荣, 陈闵叶, 张兴媛. 基于Flume、Kafka、Storm、HDFS的航空维修大数据系统[J] . 上海工程技术大学学报, 2015, 29(4):303-311.
[12] 蒋海波. 海量数据存储系统的高可靠性关键技术研究与应用[D] . 成都:电子科技大学, 2015.
[13] 王岩, 王纯. 一种基于Kafka的可靠的Consumer的设计方案[J] . 软件, 2016, 37 (1):61-66.
[14] Kreps J, Narkhede N, Rao Jun. Kafka:a distributed messaging system for log processing[C] //Proc of ACM SIGMOD Workshop on Networking Meets Databases. 2011:6-13.
[15] 田玉靖, 张晨光, 任女尔. 基于Docker的Redis缓存架构的研究[J] . 电脑知识与技术, 2015, 11(23):56-58.
[16] 陈顺风. 基于大数据的网络广告推送服务研究[D] . 北京:北京交通大学, 2015.
[17] 田京昆. 基于Redis的券商活动平台的设计与实现[D] . 哈尔滨:哈尔滨工业大学, 2015.
[18] 朱进, 胡斌, 邵华, 等. 基于内存数据库Redis的轻量级矢量地理数据组织[J] . 地球信息科学学报, 2014, 16(2):165-172.
[19] 吴霖, 刘振宇, 李佳. Redis在订阅推送系统中的应用[J] . 电脑知识与技术, 2015, 11(7):292-294.
[20] 曾泉匀. 基于Redis的分布式消息服务的设计与实现[D] . 北京:北京邮电大学, 2014.
收稿日期 2016/6/1
修回日期 2016/7/18
页码 2306-2309
中图分类号 TP301.6
文献标志码 A