摘要
为解决嵌入式系统存储受限的问题,编译器往往禁止一些会增大代码体积的优化,如循环展开、过程内联等,导致性能下降。大部分程序中存在占据程序90%以上执行时间的"热"代码,但其体积仅占程序代码小部分。利用该程序属性,提出基于热点代码的可执行代码体积优化方法,即通过程序执行剖视信息获取"热"、"冷"代码并采用不同优化方法。测试表明,与针对性能的优化相比,该方法典型测试程序代码体积平均下降15.2%,性能仅下降3.4%。
Due to limit memory capacity of embedded systems,compilers often prohibit some of the optimization that will increase the size of the code,such as loop unroll,procedure inline,etc. This may cause the loss of performance too much. It is observed that in most of programs,the "hot" code consuming 90% or above time of program execution consists of a small part of the whole program.Based on this observation,the paper proposes an executable code optimization methods based on hot code. This idea is to use the program execution profile to discover "hot" and "code" codes of a program,and then employ different strategies to optimize these "hot" and "code" codes. The experiment shows that,compared to the existing performance optimization methods,the result by the proposed method can save 15.2% volume of the code with only 3.4% loss of performance on average.
引文
[1]BESAEDES A,FERENC R,GYIMóTHY T,et al.Survey of code-size reduction methods[J].ACM Computing Surveys,2003,35(3):223-267.
[2]廉玉龙.面向嵌入式处理器的编译优化技术研究[D].杭州:浙江大学,2016.
[3]ARM Ltd.Thumb?16-bit Instruction Set[R/OL]http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001l/QRC0001_UAL.pdf.Accessed:2018-10-02.
[4]ARM Ltd.ARMv7-M Architecture Reference Manual[R/OL].https://static.docs.arm.com/ddi0403/e/DDI0403E_B_armv7m_arm.pdf.
[5]SUTTER B and BOSSCHERE K.Software techniques for program compaction[J].Communications of the ACM,2003,46(8):33-34.
[6]INTEL Corp.Intel?Parallel Studio XE 2018[R/OL].https://software.intel.com/en-us/parallel-studio-xe,2018.
[7]BESZéDES A,GERGELY T,GYIMOTHY T,et al.Optimizing for space:measurements and possibilities for improvement[C].Proceedings of the 2003 GCC Developers'Summit,2003:7-20.
[8]ANORMITY.Using the GNU compiler collection(GCC)[EB/OL].http://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc.
[9]ANORMITY.LLVM user guides[EB/OL].http://https://llvm.org/docs/GettingStarted.html.2018.
[10]BEAZLEY D&JONES B K.Python cookbook.[M].3rd Edition.Sebastopol:O’Relly Media.Inc,2013.
[11]STEVEN M.Advanced compiler design and implementation[M].San Francisco:Morgan Kaufmann,1997.
[12]BRIGGS P,COOPER K D,SIMPSON L T.Value numbering[J].Software-Practice and Experience,1997,17(6):701-724.
[13]XUE J,CAI Q.A life optimal algorithm for speculative PRE[J].ACM Transactions on Architecture and Code Optimizaion,2006,3(3):115-155.
[14]ANORMITY.GNU compiler collection(GCC)internals.[EB/OL],https://gcc.gnu.org/onlinedocs/gccint/index.html.2018
[15]高伟,赵荣彩,韩林,等.SIMD自动向量化编译优化概述[J].软件学报,2015,25(6):1265-1284.
[16]KENNEDY K,RANDY A.Optimizing compilers for modern architectures:a dependence-based approach[M].San Francisco:Morgan Kaufmann.2001.
[17]NASA.NPS parallel benchmarks[R/OL].http://nas.nasa.gov/Software/NPB.
[18]GNU Gprof.[EB/OL].http://sourceware.org/binutils/docs/gprof/index.html.2018.
[19]INTEL Corp.Intel Vtune Amplifier[R/OL].https://software.intel.com/en-us/intel-vtune-amplifier-xe/.
[20]ANOMITY.Adruino smart car[EB/OL].https://create.arduino.cc.