基于Spark平台的参数优化研究现状

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于Spark平台的参数优化研究现状

详细信息查看全文 | 推荐本文 |

英文篇名：The Research Status of Parameter Optimization Based on Spark Platform
作者：尉耀稳 ; 余彬 ; 李豪帅 ; 沈鸿达
英文作者：WEI Yao-wen;YU Bin;Li Hao-shuai;SHEN Hong-da;State Grid Zhejiang Hangzhou Xiaoshan District Power Supply Company;
关键词：大数据 ; Spark ; 性能 ; 配置参数 ; 参数优化
英文关键词：big data;;Spark;;performance;;configuration parameter;;parameter optimization
中文刊名：DNZS
英文刊名：Computer Knowledge and Technology
机构：国网浙江杭州市萧山区供电有限公司;
出版日期：2019-01-05
出版单位：电脑知识与技术
年：2019
期：v.15
语种：中文;
页：DNZS201901007
页数：3
CN：01
ISSN：34-1205/TP
分类号：17-19

摘要

近年来,为迎合大数据时代的需求,诞生了一批大数据处理平台,包括Hadoop,Spark,Storm等,Spark以其独特的优势在此中最受欢迎。尽管Spark的应用得到了大力推广,其性能还存在严重问题,很多学者正致力于寻找提升性能的有效途径。针对这一问题,他们从优化相关配置参数的角度出发,分析并总结了参数优化对Spark平台性能的重要影响以及目前国内外的Spark参数优化技术。最后,归纳了Spark参数优化现存的主要问题,并提出了下一步的研究方向。
In recent years,in order to meet the needs of the era of big data,a number of big data processing platforms have been born,including Hadoop,Spark,Storm,etc.Spark is the most popular among them because of its unique advantages.Despite the widespread use of Spark,there are serious problems with its performance.In response to this problem,they analyze and summarize the current Spark parameter optimization techniques at home and abroad from the perspective of optimizing relevant configuration parameters.Finally,the main problems existing in Spark parameter optimization are summarized,and the next research direction is proposed.

引文

[1]Apache Hadoop,http://hadoop.apache.org/
    [2]Apache Spark,http://spark.apache.org/.
    [3]Apache Storm,http://storm.apache.org/.
    [4]门威.基于MapReduce的大数据处理算法综述[J].濮阳职业技术学院学报,2017,30(5):85-88.
    [5]Omid Alipourfard,Hongqiang Harry Liu,Jianshu Chen,Shivaram Venkataraman,Minlan Yu,Ming Zhang,CherryPick:Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics,Proc.of the 14th USENIX Symposium on Networked Systems Design and Implementation(NSDI’17),Boston,MA,USA,March 27-29,2017:469-182.
    [6]SparkConfiguration,http://spark.apache.org/docs/1.6.1/configuration.html.
    [7]Gounaris A,Torres J.A Methodology for Spark Parameter Tuning[J].Big Data Research,2017.
    [8]陈侨安,李峰,曹越,等.基于运行数据分析的Spark任务参数优化[J].计算机工程与科学,2016,38(1):11-19
    [9]XU J G,WANG G L,LIU S Y,et al.A Novel Performance Evaluation and Optimization Model for Big Data System[C]//Proceedings of the 15th International Symposium on Parallel and Distributed Computing(ISPDC 2016).Fuzhou,China,2016:1765-1773.
    [10]A.J.Awan,M.Brorsson,V.Vlassov,E.Ayguade,How data volume affects spark based data analytics on a scale-up server,arXiv:1507.08340,2015.
    [11]Y.Wang,R.Goldstone,W.Yu,T.Wang,Characterization and optimization of memory-resident mapreduce on hpc systems,in:28th International Parallel and Distributed Processing Symposium(IPDPS),2014,pp.799-808.
    [12]A.Davidson,A.Or,Optimizing Shuffle Performance in Spark,Tech.Rep.,Berkeley-Department of Electrical Engineering and Computer Sciences,University of California,2016.
    [13]杨志伟,郑烇,王嵩,等.异构Spark集群下自适应任务调度策略[J].计算机工程,2016,42(1):31-35,40.
    [14]王利,王晶,张伟功,等.Linux内核参数对Spark负载性能影响的研究[J].计算机工程与科学,2017,39(7):1219-1226.
    [15]康海蒙.基于细粒度监控的Spark优化研究[D].哈尔滨工业大学,2016.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700