## Kunle Olukotun

## List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/11557277/publications.pdf

Version: 2024-02-01

49 papers

3,160 citations

16 h-index 26 g-index

50 all docs 50 docs citations

50 times ranked

1377 citing authors

| #  | Article                                                                                       | IF  | CITATIONS |
|----|-----------------------------------------------------------------------------------------------|-----|-----------|
| 1  | The case for a single-chip multiprocessor. , 1996, , .                                        |     | 417       |
| 2  | STAMP: Stanford Transactional Applications for Multi-Processing. , 2008, , .                  |     | 329       |
| 3  | Efficient Parallel Graph Exploration on Multi-Core CPU and GPU., 2011,,.                      |     | 201       |
| 4  | An effective hybrid transactional memory system with strong isolation guarantees. , 2007, , . |     | 200       |
| 5  | Transactional Memory Coherence and Consistency. Computer Architecture News, 2004, 32, 102.    | 2.5 | 162       |
| 6  | The Future of Microprocessors. Queue, 2005, 3, 26-29.                                         | 1.1 | 146       |
| 7  | Delite. Transactions on Embedded Computing Systems, 2014, 13, 1-25.                           | 2.9 | 134       |
| 8  | A practical concurrent binary search tree. , 2010, , .                                        |     | 119       |
| 9  | A Scalable, Non-blocking Approach to Transactional Memory. , 2007, , .                        |     | 113       |
| 10 | The Atomos transactional programming language. , 2006, , .                                    |     | 97        |
| 11 | Accelerating CUDA graph algorithms at maximum warp. ACM SIGPLAN Notices, 2011, 46, 267-276.   | 0.2 | 80        |
| 12 | Optimizing data structures in high-level programs. , 2013, , .                                |     | 73        |
| 13 | Exposing speculative thread parallelism in SPEC2000., 2005,,.                                 |     | 71        |
| 14 | The Jrpm system for dynamically parallelizing Java programs. , 2003, , .                      |     | 70        |
| 15 | Language virtualization for heterogeneous parallel computing. , 2010, , .                     |     | 61        |
| 16 | Evaluation of design alternatives for a multiprocessor microprocessor., 1996,,.               |     | 60        |
| 17 | A highly scalable Restricted Boltzmann Machine FPGA implementation. , 2009, , .               |     | 53        |
| 18 | Using thread-level speculation to simplify manual parallelization. , 2003, , .                |     | 52        |

| #  | Article                                                                                                                                                                        | IF  | Citations |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. Operating Systems Review (ACM), 2019, 53, 14-25.                                             | 1.9 | 51        |
| 20 | Composition and Reuse with Compiled Domain-Specific Languages. Lecture Notes in Computer Science, 2013, , 52-78.                                                               | 1.3 | 51        |
| 21 | Hardware system synthesis from Domain-Specific Languages. , 2014, , .                                                                                                          |     | 46        |
| 22 | The case for a single-chip multiprocessor. ACM SIGPLAN Notices, 1996, 31, 2-11.                                                                                                | 0.2 | 44        |
| 23 | Locality-Aware Mapping of Nested Parallel Patterns on GPUs. , 2014, , .                                                                                                        |     | 41        |
| 24 | Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent., 2017,,.                                                                                  |     | 41        |
| 25 | Architectural Semantics for Practical Transactional Memory. Computer Architecture News, 2006, 34, 53-65.                                                                       | 2.5 | 39        |
| 26 | Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency. Synthesis Lectures on Computer Architecture, 2007, 2, 1-145.                                   | 1.3 | 39        |
| 27 | The case for a single-chip multiprocessor. Operating Systems Review (ACM), 1996, 30, 2-11.                                                                                     | 1.9 | 38        |
| 28 | Eigenbench: A simple exploration tool for orthogonal TM characteristics. , 2010, , .                                                                                           |     | 35        |
| 29 | A domain-specific approach to heterogeneous parallelism. ACM SIGPLAN Notices, 2011, 46, 35-46.                                                                                 | 0.2 | 32        |
| 30 | Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns. , 2016, , .                                                               |     | 32        |
| 31 | A Large-Scale Architecture for Restricted Boltzmann Machines. , 2010, , .                                                                                                      |     | 31        |
| 32 | Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width. Advances in Neural Information Processing Systems, 2015, 28, 3079-3087.                      | 2.8 | 23        |
| 33 | The OpenTM Transactional Application Programming Interface. Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on, 2007, , . | 0.0 | 22        |
| 34 | Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford. IEEE Micro, 2010, 30, 41-55.                                                                              | 1.8 | 21        |
| 35 | Data speculation support for a chip multiprocessor. Operating Systems Review (ACM), 1998, 32, 58-69.                                                                           | 1.9 | 20        |
| 36 | Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. Computer Architecture News, 2017, 45, 561-574.                                            | 2.5 | 19        |

| #  | Article                                                                                                                            | IF  | Citations |
|----|------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | ATLAS: A Chip-Multiprocessor with Transactional Memory Support. , 2007, , .                                                        |     | 17        |
| 38 | FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures. , 2010, , .                                      |     | 13        |
| 39 | Plasticine: A Reconfigurable Accelerator for Parallel Patterns. IEEE Micro, 2018, 38, 20-31.                                       | 1.8 | 11        |
| 40 | Increasing cache port efficiency for dynamic superscalar microprocessors. Computer Architecture News, 1996, 24, 147-157.           | 2.5 | 9         |
| 41 | Data speculation support for a chip multiprocessor. ACM SIGPLAN Notices, 1998, 33, 58-69.                                          | 0.2 | 9         |
| 42 | Generating Configurable Hardware from Parallel Patterns. ACM SIGPLAN Notices, 2016, 51, 651-665.                                   | 0.2 | 9         |
| 43 | Language virtualization for heterogeneous parallel computing. ACM SIGPLAN Notices, 2010, 45, 835-847.                              | 0.2 | 7         |
| 44 | Evaluation of design alternatives for a multiprocessor microprocessor. Computer Architecture News, 1996, 24, 67-77.                | 2.5 | 6         |
| 45 | Implementing and Evaluating a Model Checker for Transactional Memory Systems. , 2010, , .                                          |     | 6         |
| 46 | Hardware acceleration of transactional memory on commodity systems. ACM SIGPLAN Notices, 2011, 46, 27-38.                          | 0.2 | 6         |
| 47 | Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling. JMLR Workshop and Conference Proceedings, 2016, 48, 1567-1576. | 1.4 | 2         |
| 48 | Designing high bandwidth on-chip caches. Computer Architecture News, 1997, 25, 121-132.                                            | 2.5 | 1         |
| 49 | High performance lattice regression on FPGAs via a high level hardware description language. , 2021, , .                           |     | 0         |