research-article

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

Authors:
David Leopoldseder

JKU Linz, Austria

JKU Linz, Austria
View Profile

,
Lukas Stadler

Oracle Labs, Austria

Oracle Labs, Austria
View Profile

,
Thomas Würthinger

Oracle Labs, Switzerland

Oracle Labs, Switzerland
View Profile

,
Josef Eisl

JKU Linz, Austria

JKU Linz, Austria
View Profile

,
Doug Simon

Oracle Labs, Switzerland

Oracle Labs, Switzerland
View Profile

,
Hanspeter Mössenböck

JKU Linz, Austria

JKU Linz, Austria
View Profile

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and OptimizationFebruary 2018Pages 126–137https://doi.org/10.1145/3168811

Published:24 February 2018Publication History

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

Pages 126–137

ABSTRACT

Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control-flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed.

This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time.

We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.

We show that our optimization can reach peak performance improvements of up to 40% with a mean peak performance increase of 5.89%, while it generates a mean code size increase of 9.93% and mean compile time increase of 18.44%.

References

David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-performance Computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. Google ScholarDigital Library
J. Eugene Ball. 1979. Predicting the Effects of Optimization on a Procedure Body. SIGPLAN Not. 14, 8 (Aug. 1979), 214–220. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 169– 190. Google ScholarDigital Library
Rastislav Bodík, Rajiv Gupta, and Mary Lou Soffa. 1998. Complete Removal of Redundant Expressions. In Proceedings of the ACM SIG-PLAN Conference on Programming Language Design and Implementation. ACM, 1–14. Google ScholarDigital Library
Preston Briggs, Keith D. Cooper, and L. Taylor Simpson. 1997. Value numbering. Software-Practice and Experience 27, 6 (1997), 701–724. Google ScholarDigital Library
Stefano Cazzulani. 2012. Octane: The JavaScript benchmark suite for the modern web. Retrieved December 21 (2012), 2015.Google Scholar
Craig David Chambers. 1992. The Design and Implementation of the Self Compiler, an Optimizing Compiler for Object-oriented Programming Languages. Ph.D. Dissertation. Stanford, CA, USA. UMI Order No. GAX92-21602.Google Scholar
Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. 1991. Using Profile Information to Assist Classic Code Optimizations. Softw. Pract. Exper. 21, 12 (Dec. 1991), 1301–1321. Google ScholarDigital Library
Cliff Click and Keith D. Cooper. 1995. Combining Analyses, Combining Optimizations. ACM Trans. Program. Lang. Syst. 17, 2 (March 1995), 16. Google ScholarDigital Library
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct. 1991), 40. Google ScholarDigital Library
Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation. In Proceedings of the Asia-Pacific Programming Languages and Compilers Workshop.Google Scholar
Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug Simon, and Hanspeter Mössenböck. 2013. An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. In Proceedings of the ACM Workshop on Virtual Machines and Intermediate Languages. Google ScholarDigital Library
Joseph A. Fisher. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Trace Scheduling: A Technique for Global Microcode Compaction, 186–198. http://dl.acm.org/citation. cfm?id=201749.201766 Google ScholarDigital Library
Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some Implementations of the Boxplot. The American Statistician 43, 1 (1989), 50–54. http://www.jstor.org/stable/2685173Google Scholar
Yoshihiko Futamura. 1999. Partial Evaluation of Computation Process– An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation 12, 4 (01 Dec 1999), 381–391. Google ScholarDigital Library
Google. 2012. V8 JavaScript Engine. (2012). http://code.google.com/p/ v8/Google Scholar
Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Quellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter The Superblock: An Effective Technique for VLIW and Superscalar Compilation, 234–253. http://dl.acm.org/citation.cfm?id=201749.201774 Google ScholarDigital Library
Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, and Toshio Nakatani. 2000. A Study of Devirtualization Techniques for a Java Just-In-Time Compiler. In Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’00). ACM, 294–310. Google ScholarDigital Library
David Leopoldseder, Lukas Stadler, Christian Wimmer, and Hanspeter Mössenböck. 2015. Java-to-JavaScript Translation via Structured Control Flow Reconstruction of Compiler IR. In Proceedings of the 11th Symposium on Dynamic Languages (DLS 2015). ACM, New York, NY, USA, 91–103. Google ScholarDigital Library
Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. 1999. Performance Estimation of Embedded Software with Instruction Cache Modeling. ACM Trans. Des. Autom. Electron. Syst. 4, 3 (1999), 257–279. Google ScholarDigital Library
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2015. The Java Virtual Machine Specification, Java SE 8 Edition. http://docs. oracle.com/javase/specs/jvms/se8/jvms8.pdf Google ScholarDigital Library
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Effective Compiler Support for Predicated Execution Using the Hyperblock, 161–170. http://dl.acm. org/citation.cfm?id=201749.201763 Google ScholarDigital Library
Frank Mueller and David B. Whalley. 1992. Avoiding Unconditional Jumps by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 322–330. Google ScholarDigital Library
Frank Mueller and David B. Whalley. 1995. Avoiding Conditional Branches by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 56–66. Google ScholarDigital Library
OpenJDK 2013. Graal Project. (2013). http://openjdk.java.net/projects/ graalGoogle Scholar
OpenJDK 2017. HotSpot Virtual Machine. (2017). http://openjdk.java. net/groups/hotspot/Google Scholar
Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java HotSpot™ Server Compiler. In Proceedings of the Java Virtual Machine Research and Technology Symposium. USENIX, 1–12. Google ScholarDigital Library
Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making Collection Operations Optimal with Aggressive JIT Compilation. In Proceedings of the 8th ACM SIG-PLAN International Symposium on Scala (SCALA 2017). ACM, New York, NY, USA, 29–40. Google ScholarDigital Library
Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo con Scala: design and analysis of a scala benchmark suite for the java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 657–676. Google ScholarDigital Library
Lukas Stadler, Gilles Duboscq, Hanspeter Mössenböck, Thomas Würthinger, and Doug Simon. 2013. An Experimental Study of the Influence of Dynamic Compiler Optimizations on Scala Performance. In Proceedings of the 4th Workshop on Scala (SCALA ’13). ACM, New York, NY, USA, Article 9, 8 pages. Google ScholarDigital Library
Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, 165–174. Google ScholarDigital Library
April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2017). ACM, New York, NY, USA, 1–10. Google ScholarDigital Library
David W. Wall. 1991. Limits of Instruction-level Parallelism. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, New York, NY, USA, 176–188. Google ScholarDigital Library
Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical Partial Evaluation for Highperformance Dynamic Language Runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 662–676. Google ScholarDigital Library
Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS ’12). ACM Press, 73–82. Google ScholarDigital Library

Index Terms

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Dynamic compilers
      2. Just-in-time compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Software infrastructure
        Virtual machines

Recommendations

Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations
ManLang '18: Proceedings of the 15th International Conference on Managed Languages & Runtimes

Java programs can contain non-counted loops, that is, loops for which the iteration count can neither be determined at compile time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often ...
Read More
Simulation-based code duplication for enhancing compiler optimizations
SPLASH Companion 2017: Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

The scope of compiler optimizations is often limited by control flow, which prohibits optimizations across basic block boundaries. Code duplication can solve this problem by extending basic block sizes, thus enabling subsequent optimizations. However, ...
Read More
Archeology of Code Duplication: Recovering Duplication Chains from Small Duplication Fragments
SYNASC '05: Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

Code duplication is a common problem, and a well-known sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
February 2018
377 pages
ISBN:9781450356176
DOI:10.1145/3179541
General Chairs:
Jens Knoop
Vienna University of Technology, Austria
,
Markus Schordan
Lawrence Livermore National Laboratory, USA
,
Program Chairs:
Teresa Johnson
Google, USA
,
Michael O'Boyle
University of Edinburgh, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Code Duplication
Compiler Optimizations
Just-In-Time Compilation
Tail Duplication
Virtual Machines
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,061submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 432
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations

Simulation-based code duplication for enhancing compiler optimizations

Archeology of Code Duplication: Recovering Duplication Chains from Small Duplication Fragments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations

Simulation-based code duplication for enhancing compiler optimizations

Archeology of Code Duplication: Recovering Duplication Chains from Small Duplication Fragments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media