用户名: 密码: 验证码:
基于Spark和梯度提升树模型的短期负荷预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Short-term load forecasting based on Spark and gradient boosting decision tree model
  • 作者:许贤泽 ; 刘静 ; 施元 ; 谭盛煌
  • 英文作者:XU Xianze;LIU Jing;SHI Yuan;TAN Shenghuang;School of Electronic Information,Wuhan University;
  • 关键词:负荷预测 ; 分布式计算 ; 大数据 ; 梯度提升树 ; Spark平台
  • 英文关键词:load forecasting;;distributed computing;;big data;;gradient boosting decision tree;;Spark platform
  • 中文刊名:HZLG
  • 英文刊名:Journal of Huazhong University of Science and Technology(Natural Science Edition)
  • 机构:武汉大学电子信息学院;
  • 出版日期:2019-05-15 17:19
  • 出版单位:华中科技大学学报(自然科学版)
  • 年:2019
  • 期:v.47;No.437
  • 基金:国家自然科学基金资助项目(51705375)
  • 语种:中文;
  • 页:HZLG201905016
  • 页数:6
  • CN:05
  • ISSN:42-1658/N
  • 分类号:89-94
摘要
利用Spark平台对电力用户侧的大数据进行分析,提出基于梯度提升树的并行负荷预测方法.首先对历史负荷和天气数据集进行并行化分割处理,并采用特征提取与转换方法获取到预测模型所需的特征向量;然后合理设定Spark集群节点数以及调节Hadoop分布式文件系统(HDFS)分块大小;最后将参数调优后的梯度提升树模型部署到Spark分布式平台上进行训练与预测,并将该模型预测结果与其他预测模型进行精度比较.研究结果表明:通过合理划分HDFS中存储块的大小能有效提高集群对于大数据处理的效率,分布式梯度提升树算法在快速性与准确性上均有比较大的优势,能够满足电力负荷预测的要求.
        A parallel load forecasting method based on gradient boosting decision tree was proposed and Spark platform was used to analyze big data of user-side.Firstly,the historical load and weather data set were parallelized and segmented,and the feature extraction and transformation methods were used to obtain the feature vector required by the prediction model.Then,the number of Spark cluster nodes and the HDFS (Hadoop distributed file system) block size were adjusted.Finally,the parameter-tuned gradient lifting tree model was deployed to the Spark distributed platform for training and prediction,and the model prediction results were compared with other prediction models.Research results show that the cluster processing efficiency for large data sets can be improved effectively by dividing the size of the storage block reasonably in HDFS.It is also demonstrated that the distributed gradient boosting decision tree algorithm has some advantages in rapidity and accuracy,which could meet requirements of the power load forecasting quite well.
引文
[1]王德文,孙志伟.电力用户侧大数据分析与并行负荷预测[J].中国电机工程学报,2015,35(3):527-537.
    [2]刘琪琛,雷景生,郝珈玮,等.基于Spark平台和并行随机森林回归算法的短期电力负荷预测[J].电力建设,2017,38(10):84-92.
    [3]ZHENG D,ESEYE A T,ZHANG J,et al.Short-term wind power forecasting using a double-stage hierarchical ANFIS approach for energy management in microgrids[J].Protection and Control of Modern Power Systems,2017,2(2):136-145.
    [4]宋易阳,李存斌,祁之强.基于云模型和模糊聚类的电力负荷模式提取方法[J].电网技术,2014,38(12):3378-3383.
    [5]万昆,柳瑞禹.区间时间序列向量自回归模型在短期电力负荷预测中的应用[J].电网技术,2012,36(11):77-81.
    [6]何耀耀,闻才喜,许启发,等.考虑温度因素的中期电力负荷概率密度预测方法[J].电网技术,2015,39(1):176-181.
    [7]周国良,宋亚奇,朱永利,等.智能电网大数据云计算技术研究[M].北京:清华大学出版社,2016.
    [8]胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015,24(4):214-218.
    [9]黄廷辉,王玉良,汪振,等.基于Spark的分布式交通流数据预测系统[J].计算机应用研究,2018,35(2):405-409,416.
    [10]DING C,WANG D G,MA X L,et al.Predicting short-term subway ridership and prioritizing its influential factors using gradient boosting decision trees[J].Sustainability,2016,8(11):1100-1115.
    [11]韩天阳.基于Spark和支持向量回归的微电网短期负荷预测研究[D].北京:华北电力大学控制与计算机工程学院,2017.
    [12]马天男,牛东晓,黄雅莉,等.基于Spark平台和多变量L_2-Boosting回归模型的分布式能源系统短期负荷预测[J].电网技术,2016,40(6):1642-1649.
    [13]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing[C]//Usenix Conference on Networked Systems Design and Implementation.Berkeley:USENIX Association,2012:1-15.
    [14]ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:cluster computing with working sets[C]//Usenix Conference on Hot Topics in Cloud Computing.Berkeley:USENIX Association,2010:1-7.
    [15]BASHIR Z A,EL-HAWARY M E.Applying wavelets to short-term load forecasting using PSO-based neural networks[J].IEEE Transactions on Power Systems,2009,24(1):20-27.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700