用户名: 密码: 验证码:
Storm环境下基于拓扑结构的任务调度策略
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Task scheduling strategy based on topology structure in Storm
  • 作者:刘粟 ; 于炯 ; 鲁亮 ; 李梓杨
  • 英文作者:LIU Su;YU Jiong;LU Liang;LI Ziyang;College of Information Science and Engineering, Xinjiang University;
  • 关键词:Storm ; 流式计算 ; 任务调度 ; 拓扑结构 ; 通信开销
  • 英文关键词:Storm;;stream computing;;task scheduling;;topology structure;;communication cost
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:新疆大学信息科学与工程学院;
  • 出版日期:2018-07-25 08:20
  • 出版单位:计算机应用
  • 年:2018
  • 期:v.38;No.340
  • 基金:国家自然科学基金资助项目(61462079,61562078,61562086);; 国家科技支撑项目(2015BAH02F01)~~
  • 语种:中文;
  • 页:JSJY201812024
  • 页数:9
  • CN:12
  • ISSN:51-1307/TP
  • 分类号:133-141
摘要
针对Storm流式计算平台中默认轮询调度策略存在通信开销大、负载不均衡的问题,提出基于拓扑结构的任务调度策略(TS~2)。首先,选取CPU资源充足且可用的工作节点并各分配一个进程,消除节点内进程间通信开销,优化进程部署;然后,分析拓扑结构,找出拓扑中度最大的组件,优先分配该组件的线程;最后,在满足节点可承载最大线程数的条件下,尽可能将关联任务部署到同一个节点来减少节点间通信开销,改善集群负载均衡,优化线程部署。实验结果表明:在系统延迟方面,与Storm默认调度策略和离线调度策略相比,TS~2的平均优化率分别为16. 91%和5. 69%,有效提高了系统的实时性;在节点间通信开销方面,TS~2相比于Storm默认调度策略平均降低了15. 75%;在平均吞吐量方面,TS~2相比于Storm默认调度策略平均提升了14. 21%。
        In order to solve the problems of large communication cost and unbalanced load in the default round-robin scheduling strategy of Storm stream computing platform, a Task Scheduling Strategy based on Topology Structure( TS~2) in Storm was proposed. Firstly, the work nodes with sufficient and available Central Processing Unit( CPU) resources were selected and only a process was allocated to each work node to eliminate the communication cost between processes within the nodes and optimize the process deployment. Then, the topology structure was analyzed, the component with the biggest degree in the topology was found and the thread of the component was assigned with the highest priority. Finally, under the condition of the maximum number of threads that a node could carry, the associated tasks were deployed to the same node as far as possible to reduce the communication cost between nodes, improve the load balance of cluster and optimize the thread deployment. The experimental results show that, in terms of system latency, the average optimization rate of TS~2 is 16. 91%and 5. 69% respectively compared with Storm default scheduling strategy and offline scheduling strategy, which effectively improves the real-time performance of system. Additionally, compared with the Storm default scheduling strategy, the communication cost between nodes of TS~2 is reduced by 15. 75% and its average throughput is improved by 14. 21%.
引文
[1]Seagate.Data Age 2025[EB/OL].[2017-11-29].https://www.seagate.com/files/www-content/our-story/trends/files/data-age-2025-white-paper-simplified-chinese.pdf.
    [2]孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.(SUN D W,ZHANG G Y,ZHENG W M.Big data stream computing:technologies and instances[J].Journal of Software,2014,25(4):839-862.)
    [3]乔通,赵卓峰,丁维龙.面向套牌甄别的流式计算系统[J].计算机应用,2017,37(1):153-158.(QIAO T,ZHAO Z F,DING WL.Stream computing system for monitoring copy plate vehicles[J].Journal of Computer Applications,2017,37(1):153-158.)
    [4]蔡宇,赵国锋,郭航.实时流处理系统Storm的调度优化综述[J].计算机应用研究,2018,35(9):2567-2573.(CAI Y,ZHAOG F,GUO H.Survey of real-time processing system Storm scheduling optimization[J].Application Research of Computers,2018,35(9):2567-2573.)
    [5]ANIELLO L,BALDONI R,QUERZONI L.Adaptive online scheduling in Storm[C]//Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems.New York:ACM,2013:207-218.
    [6]XU J L,CHEN Z H,TANG J,et al.T-Storm:traffic-aware online scheduling in Storm[C]//Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems.Piscataway,NJ:IEEE,2014:535-544.
    [7]FISCHER L,BERNSTEIN A.Workload scheduling in distributed stream processors using graph partitioning[C]//Proceedings of the2015 IEEE International Conference on Big Data.Piscataway,NJ:IEEE,2015:124-133.
    [8]鲁亮,于炯,卞琛,等.Storm环境下基于权重的任务调度算法[J].计算机应用,2018,38(3):699-706.(LU L,YU J,BIANC,et al.Task scheduling algorithm based on weight in Storm[J].Journal of Computer Applications,2018,38(3):699-706.)
    [9]ZHANG J,LI C L,ZHU L Y,et al.The real-time scheduling strategy based on traffic and load balancing in Storm[C]//Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications,IEEE 14th International Conference on Smart City,and IEEE 2nd International Conference on Data Science and Systems.Piscataway,NJ:IEEE,2016:372-379.
    [10]熊安萍,王贤稳,邹洋.基于Storm拓扑结构热边的调度算法[J].计算机工程,2017,43(1):37-42.(XIONG A P,WANG XW,ZOU Y.Scheduling algorithm based on Storm topology hotedge[J].Computer Engineering,2017,43(1):37-42.)
    [11]鲁亮,于炯,卞琛,等.大数据流式计算框架Storm的任务迁移策略[J].计算机研究与发展,2018,55(1):71-92.(LU L,YUJ,BIAN C,et al.A task migration strategy in big data stream computing with Storm[J].Journal of Computer Research and Development,2018,55(1):71-92.)
    [12]PENG B Y,HOSSEINI M,HONG Z H,et al.R-Storm:resourceaware scheduling in Storm[C]//Proceedings of the 16th Annual Middleware Conference.New York:ACM,2015:149-161.
    [13]CHEN Z,XU J,TANG J,et al.GPU-accelerated high-throughput online stream data processing[J].IEEE Transactions on Big Data,2018,4(2):191-202.
    [14]CHEN Y R,LEE C R.G-Storm:a GPU-aware Storm scheduler[C]//Proceedings of the 2016 IEEE 14th International Conference on Dependable,Autonomic and Secure Computing,14th International Conference on Pervasive Intelligence and Computing,2nd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress.Piscataway,NJ:IEEE,2016:738-745.
    [15]SUN D W,ZHANG G Y,YANG S L,et al.Re-Stream:real-time and energy-efficient resource scheduling in big data stream computing environments[J].Information Sciences,2015,319:92-112.
    [16]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed Qo S-aware scheduling in Storm[C]//Proceedings of the 9th ACMInternational Conference on Distributed Event-Based Systems.New York:ACM,2015:344-347.
    [17]FARAHABADY M R H,SAMANI H R D,WANG Y,et al.AQo S-aware controller for Apache Storm[C]//Proceedings of the2016 15th IEEE International Symposium on Network Computing and Applications.Piscataway,NJ:IEEE,2016:334-342.
    [18]李梓杨,于炯,卞琛,等.基于负载感知的数据流动态负载均衡策[J].计算机应用,2017,37(10):2760-2766.(LI Z Y,YU J,BIAN C,et al.Dynamic data stream load balancing strategy based on load awareness[J].Journal of Computer Applications,2017,37(10):2760-2766.)
    [19]QIAN W,SHEN Q N,QIN J,et al.S-Storm:a slot-aware scheduling strategy for even scheduler in Storm[C]//Proceedings of the2016 IEEE 18th International Conference on High Performance Computing and Communications,IEEE 14th International Conference on Smart City,IEEE 2nd International Conference on Data Science and Systems.Piscataway,NJ:IEEE,2017:623-630.
    [20]LI C,ZHANG J,LUO Y.Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of Storm[J].Journal of Network and Computer Applications,2017,87(1):110-115.
    [21]ZHANG J,LI C L,ZHU L Y,et al.The real-time scheduling strategy based on traffic and load balancing in Storm[C]//Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications,IEEE 14th International Conference on Smart City,IEEE 2nd International Conference on Data Science and Systems.Piscataway,NJ:IEEE,2016:372-379.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700