ABSTRACT
Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control-flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed.
This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time.
We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.
We show that our optimization can reach peak performance improvements of up to 40% with a mean peak performance increase of 5.89%, while it generates a mean code size increase of 9.93% and mean compile time increase of 18.44%.
- David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-performance Computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. Google ScholarDigital Library
- J. Eugene Ball. 1979. Predicting the Effects of Optimization on a Procedure Body. SIGPLAN Not. 14, 8 (Aug. 1979), 214–220. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 169– 190. Google ScholarDigital Library
- Rastislav Bodík, Rajiv Gupta, and Mary Lou Soffa. 1998. Complete Removal of Redundant Expressions. In Proceedings of the ACM SIG-PLAN Conference on Programming Language Design and Implementation. ACM, 1–14. Google ScholarDigital Library
- Preston Briggs, Keith D. Cooper, and L. Taylor Simpson. 1997. Value numbering. Software-Practice and Experience 27, 6 (1997), 701–724. Google ScholarDigital Library
- Stefano Cazzulani. 2012. Octane: The JavaScript benchmark suite for the modern web. Retrieved December 21 (2012), 2015.Google Scholar
- Craig David Chambers. 1992. The Design and Implementation of the Self Compiler, an Optimizing Compiler for Object-oriented Programming Languages. Ph.D. Dissertation. Stanford, CA, USA. UMI Order No. GAX92-21602.Google Scholar
- Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. 1991. Using Profile Information to Assist Classic Code Optimizations. Softw. Pract. Exper. 21, 12 (Dec. 1991), 1301–1321. Google ScholarDigital Library
- Cliff Click and Keith D. Cooper. 1995. Combining Analyses, Combining Optimizations. ACM Trans. Program. Lang. Syst. 17, 2 (March 1995), 16. Google ScholarDigital Library
- Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct. 1991), 40. Google ScholarDigital Library
- Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation. In Proceedings of the Asia-Pacific Programming Languages and Compilers Workshop.Google Scholar
- Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug Simon, and Hanspeter Mössenböck. 2013. An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. In Proceedings of the ACM Workshop on Virtual Machines and Intermediate Languages. Google ScholarDigital Library
- Joseph A. Fisher. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Trace Scheduling: A Technique for Global Microcode Compaction, 186–198. http://dl.acm.org/citation. cfm?id=201749.201766 Google ScholarDigital Library
- Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some Implementations of the Boxplot. The American Statistician 43, 1 (1989), 50–54. http://www.jstor.org/stable/2685173Google Scholar
- Yoshihiko Futamura. 1999. Partial Evaluation of Computation Process– An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation 12, 4 (01 Dec 1999), 381–391. Google ScholarDigital Library
- Google. 2012. V8 JavaScript Engine. (2012). http://code.google.com/p/ v8/Google Scholar
- Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Quellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter The Superblock: An Effective Technique for VLIW and Superscalar Compilation, 234–253. http://dl.acm.org/citation.cfm?id=201749.201774 Google ScholarDigital Library
- Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, and Toshio Nakatani. 2000. A Study of Devirtualization Techniques for a Java Just-In-Time Compiler. In Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’00). ACM, 294–310. Google ScholarDigital Library
- David Leopoldseder, Lukas Stadler, Christian Wimmer, and Hanspeter Mössenböck. 2015. Java-to-JavaScript Translation via Structured Control Flow Reconstruction of Compiler IR. In Proceedings of the 11th Symposium on Dynamic Languages (DLS 2015). ACM, New York, NY, USA, 91–103. Google ScholarDigital Library
- Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. 1999. Performance Estimation of Embedded Software with Instruction Cache Modeling. ACM Trans. Des. Autom. Electron. Syst. 4, 3 (1999), 257–279. Google ScholarDigital Library
- Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2015. The Java Virtual Machine Specification, Java SE 8 Edition. http://docs. oracle.com/javase/specs/jvms/se8/jvms8.pdf Google ScholarDigital Library
- Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Effective Compiler Support for Predicated Execution Using the Hyperblock, 161–170. http://dl.acm. org/citation.cfm?id=201749.201763 Google ScholarDigital Library
- Frank Mueller and David B. Whalley. 1992. Avoiding Unconditional Jumps by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 322–330. Google ScholarDigital Library
- Frank Mueller and David B. Whalley. 1995. Avoiding Conditional Branches by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 56–66. Google ScholarDigital Library
- OpenJDK 2013. Graal Project. (2013). http://openjdk.java.net/projects/ graalGoogle Scholar
- OpenJDK 2017. HotSpot Virtual Machine. (2017). http://openjdk.java. net/groups/hotspot/Google Scholar
- Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java HotSpot™ Server Compiler. In Proceedings of the Java Virtual Machine Research and Technology Symposium. USENIX, 1–12. Google ScholarDigital Library
- Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making Collection Operations Optimal with Aggressive JIT Compilation. In Proceedings of the 8th ACM SIG-PLAN International Symposium on Scala (SCALA 2017). ACM, New York, NY, USA, 29–40. Google ScholarDigital Library
- Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo con Scala: design and analysis of a scala benchmark suite for the java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 657–676. Google ScholarDigital Library
- Lukas Stadler, Gilles Duboscq, Hanspeter Mössenböck, Thomas Würthinger, and Doug Simon. 2013. An Experimental Study of the Influence of Dynamic Compiler Optimizations on Scala Performance. In Proceedings of the 4th Workshop on Scala (SCALA ’13). ACM, New York, NY, USA, Article 9, 8 pages. Google ScholarDigital Library
- Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, 165–174. Google ScholarDigital Library
- April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2017). ACM, New York, NY, USA, 1–10. Google ScholarDigital Library
- David W. Wall. 1991. Limits of Instruction-level Parallelism. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, New York, NY, USA, 176–188. Google ScholarDigital Library
- Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical Partial Evaluation for Highperformance Dynamic Language Runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 662–676. Google ScholarDigital Library
- Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS ’12). ACM Press, 73–82. Google ScholarDigital Library
Index Terms
- Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations
Recommendations
Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations
ManLang '18: Proceedings of the 15th International Conference on Managed Languages & RuntimesJava programs can contain non-counted loops, that is, loops for which the iteration count can neither be determined at compile time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often ...
Simulation-based code duplication for enhancing compiler optimizations
SPLASH Companion 2017: Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for HumanityThe scope of compiler optimizations is often limited by control flow, which prohibits optimizations across basic block boundaries. Code duplication can solve this problem by extending basic block sizes, thus enabling subsequent optimizations. However, ...
Archeology of Code Duplication: Recovering Duplication Chains from Small Duplication Fragments
SYNASC '05: Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific ComputingCode duplication is a common problem, and a well-known sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However,...
Comments