## Jason Cong

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/8563825/publications.pdf Version: 2024-02-01

|                | 516215           | 395343                        |
|----------------|------------------|-------------------------------|
| 5,474          | 16               | 33                            |
| citations      | h-index          | g-index                       |
|                |                  |                               |
|                |                  |                               |
|                |                  |                               |
| 112            | 112              | 3514                          |
| docs citations | times ranked     | citing authors                |
|                |                  |                               |
|                | citations<br>112 | 5,47416citationsh-index112112 |

LASON CONC

| #  | Article                                                                                                                                                                                                      | IF   | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----------|
| 1  | Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. , 2015, , .                                                                                                                 |      | 1,308     |
| 2  | High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30, 473-491.                                            | 1.9  | 594       |
| 3  | Scaling for edge inference of deep neural networks. Nature Electronics, 2018, 1, 216-222.                                                                                                                    | 13.1 | 299       |
| 4  | Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. , 2017, , .                                                                                                      |      | 267       |
| 5  | Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural<br>Networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38,<br>2072-2085. | 1.9  | 171       |
| 6  | SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With<br>Self-Supervised Perceptual Loss Network. IEEE Transactions on Medical Imaging, 2020, 39, 2289-2301.                     | 5.4  | 170       |
| 7  | Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster. , 2016, , .                                                                                                                          |      | 158       |
| 8  | FPGA-based accelerator for long short-term memory recurrent neural networks. , 2017, , .                                                                                                                     |      | 120       |
| 9  | Thermal-Aware 3D IC Placement Via Transformation. , 2007, , .                                                                                                                                                |      | 113       |
| 10 | FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. , 2009, , .                                                                                                                                |      | 110       |
| 11 | mrFPGA: A novel FPGA architecture with memristor-based reconfiguration. , 2011, , .                                                                                                                          |      | 105       |
| 12 | An automated lung segmentation approach using bidirectional chain codes to improve nodule detection accuracy. Computers in Biology and Medicine, 2015, 57, 139-149.                                          | 3.9  | 92        |
| 13 | A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. , 2016, , .                                                                                                                      |      | 81        |
| 14 | Improving high level synthesis optimization opportunity through polyhedral transformations. , 2013, ,                                                                                                        |      | 74        |
| 15 | HeteroCL. , 2019, , .                                                                                                                                                                                        |      | 73        |
| 16 | PolySA. , 2018, , .                                                                                                                                                                                          |      | 69        |
| 17 | SODA., 2018, , .                                                                                                                                                                                             |      | 59        |
| 18 | Optimality Study of Logic Synthesis for LUT-Based FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2007, 26, 230-239.                                                   | 1.9  | 49        |

2

| #  | Article                                                                                                                                                                                    | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | A Fully Pipelined and Dynamically Composable Architecture of CGRA. , 2014, , .                                                                                                             |     | 47        |
| 20 | High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs. , 2007, , .                                                                                                  |     | 45        |
| 21 | Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU. , 2019, , .                                                                     |     | 45        |
| 22 | An Analytical Placement Framework for 3-D ICs and Its Extension on Thermal Awareness. IEEE<br>Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 32, 510-523. | 1.9 | 43        |
| 23 | Optimal layout synthesis for quantum computing. , 2020, , .                                                                                                                                |     | 42        |
| 24 | A Novel High-Throughput Acceleration Engine for Read Alignment. , 2015, , .                                                                                                                |     | 41        |
| 25 | FPGA HLS Today: Successes, Challenges, and Opportunities. ACM Transactions on Reconfigurable Technology and Systems, 2022, 15, 1-42.                                                       | 1.9 | 40        |
| 26 | Multilevel Granularity Parallelism Synthesis on FPGAs. , 2011, , .                                                                                                                         |     | 39        |
| 27 | HBM Connect: High-Performance HLS Interconnect for FPGA HBM. , 2021, 2021, 116-126.                                                                                                        |     | 39        |
| 28 | Protecting Combinational Logic Synthesis Solutions. IEEE Transactions on Computer-Aided Design of<br>Integrated Circuits and Systems, 2006, 25, 2687-2696.                                 | 1.9 | 36        |
| 29 | Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management. , 2019, , .                                                                     |     | 36        |
| 30 | Optimality Study of Existing Quantum Computing Layout Synthesis Tools. IEEE Transactions on Computers, 2021, 70, 1363-1373.                                                                | 2.4 | 36        |
| 31 | Accelerator-rich CMPs: From concept to real hardware. , 2013, , .                                                                                                                          |     | 31        |
| 32 | Bonsai: High-Performance Adaptive Merge Tree Sorting. , 2020, , .                                                                                                                          |     | 29        |
| 33 | An energy-efficient adaptive hybrid cache. , 2011, , .                                                                                                                                     |     | 27        |
| 34 | Assuring application-level correctness against soft errors. , 2011, , .                                                                                                                    |     | 25        |
| 35 | Wire width planning for interconnect performance optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2002, 21, 319-329.                           | 1.9 | 24        |
| 36 | Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication. ,<br>2022, , .                                                                             |     | 24        |

| #  | Article                                                                                                                                                                                    | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | Energy efficient multiprocessor task scheduling under input-dependent variation. , 2009, , .                                                                                               |     | 23        |
| 38 | Frequency Improvement of Systolic Array-Based CNNs on FPGAs. , 2019, , .                                                                                                                   |     | 21        |
| 39 | AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators. ACM Transactions on Design Automation of Electronic Systems, 2022, 27, 1-27.                                 | 1.9 | 21        |
| 40 | HLScope: High-Level Performance Debugging for FPGA Designs. , 2017, , .                                                                                                                    |     | 20        |
| 41 | Fine grain 3D integration for microarchitecture design through cube packing exploration. , 2007, , .                                                                                       |     | 19        |
| 42 | Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. , 2013, , .                                                                                              |     | 19        |
| 43 | A Millimeter-Wave CMOS Transceiver With Digitally Pre-Distorted PAM-4 Modulation for Contactless<br>Communications. IEEE Journal of Solid-State Circuits, 2019, 54, 1600-1612.             | 3.5 | 19        |
| 44 | Highly Efficient Gradient Computation for Density-Constrained Analytical Placement. IEEE<br>Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27, 2133-2144. | 1.9 | 18        |
| 45 | Evaluation of Static Analysis Techniques for Fixed-Point Precision Optimization. , 2009, , .                                                                                               |     | 18        |
| 46 | HLScope+,: Fast and accurate performance estimation for FPGA HLS. , 2017, , .                                                                                                              |     | 18        |
| 47 | Logic synthesis for better than worst-case designs. , 2009, , .                                                                                                                            |     | 17        |
| 48 | Technology Mapping and Clustering for FPGA Architectures With Dual Supply Voltages. IEEE<br>Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2010, 29, 1709-1722. | 1.9 | 17        |
| 49 | The DIMM tree architecture: A high bandwidth and scalable memory system. , 2011, , .                                                                                                       |     | 17        |
| 50 | A Hybrid Architecture for Compressive Sensing 3-D CT Reconstruction. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2012, 2, 616-625.                               | 2.7 | 17        |
| 51 | MC-Sim: An efficient simulation tool for MPSoC designs. , 2008, , .                                                                                                                        |     | 16        |
| 52 | Via design rule consideration in multilayer maze routing algorithms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000, 19, 215-223.                     | 1.9 | 15        |
| 53 | Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms. , 2011, , .                                                                                                             |     | 15        |
| 54 | Platform characterization for Domain-Specific Computing. , 2012, , .                                                                                                                       |     | 15        |

| #  | Article                                                                                                                                                                                    | IF   | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----------|
| 55 | Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers. , 2016, , .                                                                                                 |      | 15        |
| 56 | S2FA., 2018,,.                                                                                                                                                                             |      | 15        |
| 57 | Logic-on-logic 3D integration and placement. , 2010, , .                                                                                                                                   |      | 14        |
| 58 | An integrated and automated memory optimization flow for FPGA behavioral synthesis. , 2012, , .                                                                                            |      | 14        |
| 59 | Energy-efficient computing using adaptive table lookup based on nonvolatile memories. , 2013, , .                                                                                          |      | 14        |
| 60 | Combined loop transformation and hierarchy allocation for data reuse optimization. , 2011, , .                                                                                             |      | 13        |
| 61 | Combining module selection and replication for throughput-driven streaming programs. , 2012, , .                                                                                           |      | 13        |
| 62 | Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems. , 2017, , .                                                                                          |      | 13        |
| 63 | BLINK. , 2020, , .                                                                                                                                                                         |      | 13        |
| 64 | A scalable, high-performance customized priority queue. , 2014, , .                                                                                                                        |      | 12        |
| 65 | Customizable Computing—From Single Chip to Datacenters. Proceedings of the IEEE, 2019, 107, 185-203.                                                                                       | 16.4 | 12        |
| 66 | An efficient approach to simultaneous transistor and interconnect sizing. , 0, , .                                                                                                         |      | 11        |
| 67 | Synthesis Algorithm for Application-Specific Homogeneous Processor Networks. IEEE Transactions on<br>Very Large Scale Integration (VLSI) Systems, 2009, 17, 1318-1329.                     | 2.1  | 11        |
| 68 | FLASH: Fast, Parallel, and Accurate Simulator for HLS. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39, 4828-4841.                                 | 1.9  | 11        |
| 69 | Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable<br>Processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2006, 14, 986-997. | 2.1  | 10        |
| 70 | Routability-driven placement and white space allocation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2007, 26, 858-871.                                 | 1.9  | 10        |
| 71 | A 3D physical design flow based on Open Access. , 2009, , .                                                                                                                                |      | 10        |
|    |                                                                                                                                                                                            |      |           |

Domain-specific processor with 3D integration for medical image processing. , 2011, , .

10

| #  | Article                                                                                                                                                                                   | IF  | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | Scheduling with integer time budgeting for low-power optimization. , 2008, , .                                                                                                            |     | 9         |
| 74 | Utilizing Radio-Frequency Interconnect for a Many-DIMM DRAM System. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2012, 2, 210-227.                               | 2.7 | 9         |
| 75 | Computed Tomography Image Enhancement Using 3D Convolutional Neural Network. Lecture Notes in<br>Computer Science, 2018, , 291-299.                                                       | 1.0 | 9         |
| 76 | Rapid Cycle-Accurate Simulator for High-Level Synthesis. , 2019, , .                                                                                                                      |     | 9         |
| 77 | LANMC., 2019,,.                                                                                                                                                                           |     | 9         |
| 78 | Extending High-Level Synthesis for Task-Parallel Programs. , 2021, 2021, .                                                                                                                |     | 9         |
| 79 | Accelerating vision and navigation applications on a customizable platform. , 2011, , .                                                                                                   |     | 8         |
| 80 | RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses. IEEE Transactions on Computers, 2019, 68, 239-254.                               | 2.4 | 8         |
| 81 | Platform-Based Resource Binding Using a Distributed Register-File Microarchitecture. IEEE/ACM<br>International Conference on Computer-Aided Design, Digest of Technical Papers, 2006, , . | 0.0 | 7         |
| 82 | Evaluating Statistical Power Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2010, 29, 1750-1762.                                            | 1.9 | 6         |
| 83 | ATree-based topology synthesis for on-chip network. , 2011, , .                                                                                                                           |     | 6         |
| 84 | Rethinking thermal via planning with timing-power-temperature dependence for 3D ICs. , 2011, , .                                                                                          |     | 6         |
| 85 | FPGA Simulation Engine for Customized Construction of Neural Microcircuit. , 2013, 2013, 229.                                                                                             |     | 6         |
| 86 | FPGA Implementation of EM Algorithm for 3D CT Reconstruction. , 2014, , .                                                                                                                 |     | 6         |
| 87 | 3D recursive Gaussian IIR on GPU and FPGAs — A case for accelerating bandwidth-bounded applications. , 2011, , .                                                                          |     | 5         |
| 88 | Compilation and architecture support for customized vector instruction extension. , 2012, , .                                                                                             |     | 5         |
| 89 | ARACompiler: a prototyping flow and evaluation framework for accelerator-rich architectures. , 2015, , .                                                                                  |     | 5         |
| 90 | Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs. IEEE<br>Transactions on Computers, 2020, 69, 931-943.                                          | 2.4 | 5         |

| #   | Article                                                                                                                                                                                                | IF  | CITATIONS |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 91  | AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators. , 2021, , .                                                                                                                 |     | 5         |
| 92  | Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis. , 2019, , .                                                                                                |     | 5         |
| 93  | FPGA simulation engine for customized construction of neural microcircuits. , 2013, , .                                                                                                                |     | 4         |
| 94  | Impact of loop transformations on software reliability. , 2015, , .                                                                                                                                    |     | 4         |
| 95  | An 8M Polygons/s 3-D Graphics SoC With Full Hardware Geometric and Rendering Engine for Mobile<br>Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2011, 19, 1490-1495. | 2.1 | 3         |
| 96  | Better-Than-Worst-Case Design: Progress and Opportunities. Journal of Computer Science and Technology, 2014, 29, 656-663.                                                                              | 0.9 | 3         |
| 97  | Architectural synthesis Integrated with global placement for multi-cycle communication. , 0, , .                                                                                                       |     | 2         |
| 98  | Large-scale circuit placement: gap and promise. , 2003, , .                                                                                                                                            |     | 2         |
| 99  | Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA. , 2008, , .                                                                                                  |     | 2         |
| 100 | A Comparative Study on the Architecture Templates for Dynamic Nested Loops. , 2010, , .                                                                                                                |     | 2         |
| 101 | Pattern-Mining for Behavioral Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30, 939-944.                                                             | 1.9 | 2         |
| 102 | A unified optimization framework for simultaneous gate sizing and placement under density constraints. , 2011, , .                                                                                     |     | 2         |
| 103 | On the futility of statistical power optimization. , 2009, , .                                                                                                                                         |     | 1         |
| 104 | Task-Level Data Model for Hardware Synthesis Based on Concurrent Collections. Journal of<br>Electrical and Computer Engineering, 2012, 2012, 1-24.                                                     | 0.6 | 1         |
| 105 | "High-level synthesis and beyond - From datacenters to IoTs". , 2015, , .                                                                                                                              |     | 1         |
| 106 | Automatic Interior I/O Elimination in Systolic Array Architecture. , 2018, , .                                                                                                                         |     | 1         |
| 107 | Channel Density Minimization by Pin Permutation. VLSI Design, 1994, 2, 171-183.                                                                                                                        | 0.5 | 0         |
| 108 | Accelerator-Rich Architectures — Computing Beyond Processors. , 2015, , 1-17.                                                                                                                          |     | 0         |

| #   | Article                                                                       | IF | CITATIONS |
|-----|-------------------------------------------------------------------------------|----|-----------|
| 109 | PYXIS: An Open-Source Performance Dataset Of Sparse Accelerators. , 2022, , . |    | 0         |
|     |                                                                               |    |           |