摘要
针对现有HDFS负载均衡改进算法在解决负载状态衡量的片面性、滞后性以及阈值设定的静态性问题等方面存在不足,提出了一种基于负载预测的HDFS动态负载均衡改进算法。该算法首先探究影响存储效率的负载特征,定义多指标负载计算数学函数,然后通过优化的二次指数平滑预测算法,对节点下一时刻的负载值进行动态预测,最后根据预测结果和节点性能对集群的繁忙程度、响应效率和均衡程度等信息进行综合评估,建立动态阈值计算模型,进而对集群做出均衡判断和决策,更好地实现集群动态均衡效果。理论分析和实验结果表明,改进算法对存储系统中的负载均衡调度具有高效性,在达到更好均衡效果的同时也缩短了作业的完成时间,提高了系统整体响应效率。
The existing HDFS load balancing improvement algorithm has some shortcomings in solving the one-sidedness, hysteresis, and static setting of the threshold state. An improved HDFS dynamic load balancing algorithm based on load prediction is proposed. Firstly, the algorithm explores the load characteristics that affect storage efficiency, defines the mathematical function of multi-index load calculation, and then dynamically predicts the load value of the node at the next moment through the optimized quadratic exponential smoothing prediction algorithm. Finally, based on the prediction result and node performance. The information such as the busyness, response efficiency and balance degree of the cluster is comprehensively evaluated, and a dynamic threshold calculation model is established, thereby making balanced judgments and decisions for the cluster, and better realizing the dynamic balancing effect of the cluster. Theoretical analysis and experiments show that the improved algorithm is efficient for load balancing scheduling in the storage system. While achieving better balancing effect, it also shortens the completion time of the operation and improves the overall response efficiency of the system.
引文
[1]孙耀,刘杰,叶丹,等.分布式文件系统元数据服务的负载均衡框架[J].软件学报,2016,27(12):3192-3207.
[2]Liu K,Xu G,Yuan J.An improved hadoop data load balancing algorithm[J].Journal of Networks,2013,8(12):2816.
[3]Prashant S,Kamalakar K.A multi-agent simulation framework on small Hadoop cluster[J].Engineering Applications of Artificial Intelligence,2011,24(7):1120-1127.
[4]邵必林,吴书强,刘江,等.重要数据完整性分布式检测系统[J].探测与控制学报,2018,40(2):93-98.
[5]郝春亮,沈捷,张珩,等.大数据背景下集群调度结构与研究进展[J].计算机研究与发展,2018,55(1):53-70.
[6]康承昆,刘晓洁.一种基于多衡量指标的HDFS负载均衡算法[J].四川大学学报(自然科学版),2014,51(6):1163-1169.
[7]张松,杜庆伟,孙静,等.Hadoop异构集群中数据负载均衡的研究[J].计算机应用与软件,2016,33(5):31-34.
[8]周渭博,钟勇,李振东.基于存储熵的存储负载均衡算法[J].计算机应用,2017,37(8):2209-2213.
[9]Qiu Z,Lin Z,Ma Y.Research of Hadoop based data flow management system[J].The Journal of China Universities of Posts and Telecommunications,2011,18(2):164-168.
[10]Watts J,Taylor S.A practical approach to dynamic load balancing[J].Parallel and Distributed Systems,1998,9(3):235.
[11]Giacomo Sbrana,Andrea Silvestrini.Comparing aggregate and disaggregate forecasts of first order moving average models[J].Statistical Papers,2012,53(2):255-263.
[12]张嘉望,郭军献,李福松.基于最优平滑系数三次指数平滑法的转速预测[J].探测与控制学报,2015,37(5):43-46.
[13]沙毅,张立立,朱丽春.基于指数平滑预测的Ad Hoc网络路由协议[J].小型微型计算机系统,2012,33(3):462-465.
[14]Kameda H,Li J,Kim C,et al.Optimal load balancing in distributed computer systems[M].Incorporated:Springer Publishing Company,2011.
[15]Chen Y,Sun X,Thakur R,et al.Improving parallel I/O performance with data layout awareness[C]//CLUSTER.Heraklion,Crete:IEEE,2010:302-311.
[16]Su S,Li J,Huang Q,et al.Cost-efficient task scheduling for executing large programs in the cloud[J].Parallel Computing,2013,39(4):177-188.