skip to main content
10.1145/3168811acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

Published:24 February 2018Publication History

ABSTRACT

Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control-flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed.

This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time.

We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.

We show that our optimization can reach peak performance improvements of up to 40% with a mean peak performance increase of 5.89%, while it generates a mean code size increase of 9.93% and mean compile time increase of 18.44%.

References

  1. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-performance Computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Eugene Ball. 1979. Predicting the Effects of Optimization on a Procedure Body. SIGPLAN Not. 14, 8 (Aug. 1979), 214–220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 169– 190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rastislav Bodík, Rajiv Gupta, and Mary Lou Soffa. 1998. Complete Removal of Redundant Expressions. In Proceedings of the ACM SIG-PLAN Conference on Programming Language Design and Implementation. ACM, 1–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Preston Briggs, Keith D. Cooper, and L. Taylor Simpson. 1997. Value numbering. Software-Practice and Experience 27, 6 (1997), 701–724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stefano Cazzulani. 2012. Octane: The JavaScript benchmark suite for the modern web. Retrieved December 21 (2012), 2015.Google ScholarGoogle Scholar
  7. Craig David Chambers. 1992. The Design and Implementation of the Self Compiler, an Optimizing Compiler for Object-oriented Programming Languages. Ph.D. Dissertation. Stanford, CA, USA. UMI Order No. GAX92-21602.Google ScholarGoogle Scholar
  8. Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. 1991. Using Profile Information to Assist Classic Code Optimizations. Softw. Pract. Exper. 21, 12 (Dec. 1991), 1301–1321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cliff Click and Keith D. Cooper. 1995. Combining Analyses, Combining Optimizations. ACM Trans. Program. Lang. Syst. 17, 2 (March 1995), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct. 1991), 40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation. In Proceedings of the Asia-Pacific Programming Languages and Compilers Workshop.Google ScholarGoogle Scholar
  12. Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug Simon, and Hanspeter Mössenböck. 2013. An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. In Proceedings of the ACM Workshop on Virtual Machines and Intermediate Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph A. Fisher. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Trace Scheduling: A Technique for Global Microcode Compaction, 186–198. http://dl.acm.org/citation. cfm?id=201749.201766 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some Implementations of the Boxplot. The American Statistician 43, 1 (1989), 50–54. http://www.jstor.org/stable/2685173Google ScholarGoogle Scholar
  15. Yoshihiko Futamura. 1999. Partial Evaluation of Computation Process– An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation 12, 4 (01 Dec 1999), 381–391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google. 2012. V8 JavaScript Engine. (2012). http://code.google.com/p/ v8/Google ScholarGoogle Scholar
  17. Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Quellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter The Superblock: An Effective Technique for VLIW and Superscalar Compilation, 234–253. http://dl.acm.org/citation.cfm?id=201749.201774 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, and Toshio Nakatani. 2000. A Study of Devirtualization Techniques for a Java Just-In-Time Compiler. In Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’00). ACM, 294–310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Leopoldseder, Lukas Stadler, Christian Wimmer, and Hanspeter Mössenböck. 2015. Java-to-JavaScript Translation via Structured Control Flow Reconstruction of Compiler IR. In Proceedings of the 11th Symposium on Dynamic Languages (DLS 2015). ACM, New York, NY, USA, 91–103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. 1999. Performance Estimation of Embedded Software with Instruction Cache Modeling. ACM Trans. Des. Autom. Electron. Syst. 4, 3 (1999), 257–279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2015. The Java Virtual Machine Specification, Java SE 8 Edition. http://docs. oracle.com/javase/specs/jvms/se8/jvms8.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Effective Compiler Support for Predicated Execution Using the Hyperblock, 161–170. http://dl.acm. org/citation.cfm?id=201749.201763 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Frank Mueller and David B. Whalley. 1992. Avoiding Unconditional Jumps by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 322–330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Frank Mueller and David B. Whalley. 1995. Avoiding Conditional Branches by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 56–66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. OpenJDK 2013. Graal Project. (2013). http://openjdk.java.net/projects/ graalGoogle ScholarGoogle Scholar
  26. OpenJDK 2017. HotSpot Virtual Machine. (2017). http://openjdk.java. net/groups/hotspot/Google ScholarGoogle Scholar
  27. Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java HotSpot™ Server Compiler. In Proceedings of the Java Virtual Machine Research and Technology Symposium. USENIX, 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making Collection Operations Optimal with Aggressive JIT Compilation. In Proceedings of the 8th ACM SIG-PLAN International Symposium on Scala (SCALA 2017). ACM, New York, NY, USA, 29–40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo con Scala: design and analysis of a scala benchmark suite for the java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 657–676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lukas Stadler, Gilles Duboscq, Hanspeter Mössenböck, Thomas Würthinger, and Doug Simon. 2013. An Experimental Study of the Influence of Dynamic Compiler Optimizations on Scala Performance. In Proceedings of the 4th Workshop on Scala (SCALA ’13). ACM, New York, NY, USA, Article 9, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, 165–174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2017). ACM, New York, NY, USA, 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David W. Wall. 1991. Limits of Instruction-level Parallelism. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, New York, NY, USA, 176–188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical Partial Evaluation for Highperformance Dynamic Language Runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 662–676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS ’12). ACM Press, 73–82. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
          February 2018
          377 pages
          ISBN:9781450356176
          DOI:10.1145/3179541

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 February 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate312of1,061submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader