## Kiyoung Choi

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/544903/publications.pdf

Version: 2024-02-01

|          |                    | 759055       | 642610         |
|----------|--------------------|--------------|----------------|
| 115      | 1,296<br>citations | 12           | 23             |
| papers   | citations          | h-index      | g-index        |
|          |                    |              |                |
|          |                    |              |                |
| 115      | 115                | 115          | 071            |
| 115      | 115                | 115          | 871            |
| all docs | docs citations     | times ranked | citing authors |
|          |                    |              |                |

| #  | Article                                                                                                                                                                                           | IF  | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 1  | Deep neural networks with weighted spikes. Neurocomputing, 2018, 311, 373-386.                                                                                                                    | 3.5 | 78        |
| 2  | Power conscious fixed priority scheduling for hard real-time systems. , 0, , .                                                                                                                    |     | 70        |
| 3  | Partial bus-invert coding for power optimization of application-specific systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2001, 9, 377-383.                             | 2.1 | 70        |
| 4  | DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture. , 2014, , .                                                                                                                     |     | 70        |
| 5  | Mapping Multi-Domain Applications Onto Coarse-Grained Reconfigurable Architectures. IEEE<br>Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30, 637-650.          | 1.9 | 47        |
| 6  | Power optimization of real-time embedded systems on variable speed processors. , 0, , .                                                                                                           |     | 45        |
| 7  | Power-conscious configuration cache structure and code mapping for coarse-grained reconfigurable architecture., 2006,,.                                                                           |     | 45        |
| 8  | ExtraV. Proceedings of the VLDB Endowment, 2017, 10, 1706-1717.                                                                                                                                   | 2.1 | 43        |
| 9  | FloRA: Coarse-grained reconfigurable architecture with floating-point operation capability. , 2009, , .                                                                                           |     | 41        |
| 10 | An energy-efficient random number generator for stochastic circuits. , 2016, , .                                                                                                                  |     | 41        |
| 11 | Prediction Hybrid Cache: An Energy-Efficient STT-RAM Cache Architecture. IEEE Transactions on Computers, 2016, 65, 940-951.                                                                       | 2.4 | 41        |
| 12 | Approximate de-randomizer for stochastic circuits. , 2015, , .                                                                                                                                    |     | 34        |
| 13 | Accurate and Efficient Stochastic Computing Hardware for Convolutional Neural Networks. , 2017, , .                                                                                               |     | 34        |
| 14 | Design Space Exploration for Efficient Resource Utilization in Coarse-Grained Reconfigurable Architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18, 1471-1482. | 2.1 | 31        |
| 15 | Exploiting New Interconnect Technologies in On-Chip Communication. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2012, 2, 124-136.                                        | 2.7 | 30        |
| 16 | Performance-driven high-level synthesis with bit-level chaining and clock selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2001, 20, 199-212.            | 1.9 | 29        |
| 17 | Low Power Reconfiguration Technique for Coarse-Grained Reconfigurable Architecture. IEEE<br>Transactions on Very Large Scale Integration (VLSI) Systems, 2009, 17, 593-603.                       | 2.1 | 27        |
| 18 | Scalable stochastic-computing accelerator for convolutional neural networks., 2017,,.                                                                                                             |     | 27        |

| #  | Article                                                                                                                                                                                                | IF  | Citations |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | SoCDAL. ACM Transactions on Design Automation of Electronic Systems, 2008, 13, 1-38.                                                                                                                   | 1.9 | 24        |
| 20 | Write intensity prediction for energy-efficient non-volatile caches. , 2013, , .                                                                                                                       |     | 23        |
| 21 | Scheduler implementation in MP SoC design. , 2005, , .                                                                                                                                                 |     | 21        |
| 22 | Design space exploration of FPGA accelerators for convolutional neural networks., 2017,,.                                                                                                              |     | 21        |
| 23 | Narrow bus encoding for low-power DSP systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2001, 9, 656-660.                                                                     | 2.1 | 20        |
| 24 | Communication Architecture Synthesis of Cascaded Bus Matrix. , 2007, , .                                                                                                                               |     | 20        |
| 25 | Lower-bits cache for low power STT-RAM caches. , 2012, , .                                                                                                                                             |     | 17        |
| 26 | Partial bus-invert coding for power optimization of system level bus. , 0, , .                                                                                                                         |     | 17        |
| 27 | Dynamic Power Management of Off-Chip Links for Hybrid Memory Cubes. , 2014, , .                                                                                                                        |     | 15        |
| 28 | Mapping and Scheduling of Tasks and Communications on Many-Core SoC Under Local Memory Constraint. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 32, 1748-1761. | 1.9 | 14        |
| 29 | ComPEND., 2018, , .                                                                                                                                                                                    |     | 14        |
| 30 | Automatic mapping of application to coarse-grained reconfigurable architecture based on high-level synthesis techniques., 2008,,.                                                                      |     | 13        |
| 31 | An FPGA implementation of high-throughput key-value store using Bloom filter. , 2014, , .                                                                                                              |     | 12        |
| 32 | Delay Monitoring System With Multiple Generic Monitors for Wide Voltage Range Operation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018, 26, 37-49.                            | 2.1 | 12        |
| 33 | Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture. , $2010,  ,  .$                                                                                      |     | 10        |
| 34 | A deadlock-free routing algorithm requiring no virtual channel on 3D-NoCs with partial vertical connections. , 2013, , .                                                                               |     | 10        |
| 35 | Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level Prefetching. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2016, 24, 453-464.                               | 2.1 | 10        |
| 36 | Aging Compensation With Dynamic Computation Approximation. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67, 1319-1332.                                                           | 3.5 | 10        |

| #  | Article                                                                                                                                                                                   | IF  | Citations |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | Scheduling with accurate communication delay model and scheduler implementation for multiprocessor system-on-chip. Design Automation for Embedded Systems, 2007, 11, 167-191.             | 0.7 | 9         |
| 38 | Adaptively weighted roundâ€robin arbitration for equality of service in a manyâ€core networkâ€onâ€chip. IET Computers and Digital Techniques, 2016, 10, 37-44.                            | 0.9 | 9         |
| 39 | Loop pipelining in hardware-software partitioning. , 0, , .                                                                                                                               |     | 8         |
| 40 | An approach to code compression for CGRA. , 2011, , .                                                                                                                                     |     | 8         |
| 41 | Software-Level Approaches for Tolerating Transient Faults in a Coarse-GrainedReconfigurable Architecture. IEEE Transactions on Dependable and Secure Computing, 2014, 11, 392-398.        | 3.7 | 8         |
| 42 | Buffered Compares: Excavating the Hidden Parallelism Inside DRAM Architectures with Lightweight Logic. , $2016, $ , .                                                                     |     | 8         |
| 43 | Enforcing schedulability of multi-task systems by hardware-software codesign. , 0, , .                                                                                                    |     | 7         |
| 44 | Isomorphism-Aware Identification of Custom Instructions With I/O Serialization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 32, 34-46.           | 1.9 | 7         |
| 45 | Energy-efficient partitioning of hybrid caches in multi-core architecture. , 2014, , .                                                                                                    |     | 7         |
| 46 | FPGA implementation of convolutional neural network based on stochastic computing., 2017,,.                                                                                               |     | 7         |
| 47 | VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks. , 2019, , .                                                                                |     | 7         |
| 48 | Interleaving partial bus-invert coding for low power reconfiguration of FPGAs., 0,,.                                                                                                      |     | 6         |
| 49 | Optimizing Timed Cosimulation by Hybrid Synchronization. Design Automation for Embedded Systems, 2000, 5, 129-152.                                                                        | 0.7 | 6         |
| 50 | Performance improvement of multi-processor systems cosimulation based on SW analysis., 0,,.                                                                                               |     | 6         |
| 51 | Position-based weighted round-robin arbitration for equality of service in many-core network-on-chips. , 2012, , .                                                                        |     | 6         |
| 52 | Exploration of trade-offs in the design of volatile STT–RAM cache. Journal of Systems Architecture, 2016, 71, 23-31.                                                                      | 2.5 | 6         |
| 53 | An Efficient and Accurate Stochastic Number Generator Using Even-Distribution Coding. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37, 3056-3066. | 1.9 | 6         |
| 54 | An Integrated Cosimulation Environment for Heterogeneous Systems Prototyping. Design Automation for Embedded Systems, 1998, 3, 163-186.                                                   | 0.7 | 5         |

| #  | Article                                                                                                                                                                    | IF  | Citations |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | ODALRISC: A small, low power, and configurable 32-bit RISC processor., 2008,,.                                                                                             |     | 5         |
| 56 | An adaptive routing algorithm for 3D mesh NoC with limited vertical bandwidth. , 2012, , .                                                                                 |     | 5         |
| 57 | Active Memory Processor for Network-on-Chip-Based Architecture. IEEE Transactions on Computers, 2012, 61, 622-635.                                                         | 2.4 | 5         |
| 58 | Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25, 1793-1806. | 2.1 | 5         |
| 59 | Optimizing geographically distributed timed cosimulation by hierarchically grouped messages. , 0, , .                                                                      |     | 4         |
| 60 | Fast cycle-approximate MPSoC simulation based onÂsynchronization time-point prediction. Design Automation for Embedded Systems, 2007, 11, 223-247.                         | 0.7 | 4         |
| 61 | Automatic mapping of control-intensive kernels onto coarse-grained reconfigurable array architecture with speculative execution. , $2010$ , , .                            |     | 4         |
| 62 | A host-accelerator communication architecture design for efficient binary acceleration. , 2011, , .                                                                        |     | 4         |
| 63 | LASIC: Loop-Aware Sleepy Instruction Caches Based on STT-RAM Technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2014, 22, 1197-1201.            | 2.1 | 4         |
| 64 | Aging Gracefully with Approximation. , 2019, , .                                                                                                                           |     | 4         |
| 65 | Software synthesis through task decomposition by dependency analysis. , 0, , .                                                                                             |     | 3         |
| 66 | Adaptive Delay Monitoring for Wide Voltage-Range Operation. , 2016, , .                                                                                                    |     | 3         |
| 67 | VHDL simulation acceleration using specialized functions. , 0, , .                                                                                                         |     | 2         |
| 68 | Modified half rail differential logic for reduced internal logic swing. , 0, , .                                                                                           |     | 2         |
| 69 | Self-timed statistical carry lookahead adder using multiple-output DCVSL., 0,,.                                                                                            |     | 2         |
| 70 | Automatic Bus Matrix Synthesis based on Hardware Interface Selection for Fast Communication Design Space Exploration. , 2007, , .                                          |     | 2         |
| 71 | Communication architecture design for reconfigurable multimedia SoC platform. Design Automation for Embedded Systems, 2010, 14, 1-20.                                      | 0.7 | 2         |
| 72 | Fast custom instruction generation under area constraint. , 2010, , .                                                                                                      |     | 2         |

| #  | Article                                                                                                                                                                        | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | A polynomial-time custom instruction identification algorithm based on dynamic programming. , 2011, , .                                                                        |     | 2         |
| 74 | High-level synthesis with distributed controller for fast timing closure. , 2011, , .                                                                                          |     | 2         |
| 75 | Resource-shared custom instruction generation under performance/area constraints., 2012,,.                                                                                     |     | 2         |
| 76 | Leveraging parallelism in the presence of control flow on CGRAs. , 2014, , .                                                                                                   |     | 2         |
| 77 | ComPreEND: Computation Pruning through Predictive Early Negative Detection for ReLU in a Deep Neural Network Accelerator. IEEE Transactions on Computers, 2022, 71, 1537-1550. | 2.4 | 2         |
| 78 | Hardware-software cosynthesis for run-time incrementally reconfigurable FPGAs. , 0, , .                                                                                        |     | 1         |
| 79 | Low power self-timed radix-2 division. , 2000, , .                                                                                                                             |     | 1         |
| 80 | Narrow bus encoding for low power systems. , 0, , .                                                                                                                            |     | 1         |
| 81 | An Efficient Simulation Environment and Simulation Techniques for Bluetooth Device Design. Design Automation for Embedded Systems, 2003, 8, 119-138.                           | 0.7 | 1         |
| 82 | Multi-codec variable length decoder design with configurable processor. , 2008, , .                                                                                            |     | 1         |
| 83 | Leakage power reduction of functional units in processors having zero-overhead loop counter., 2009,,.                                                                          |     | 1         |
| 84 | A formal approach toward developing an equivalent circuit for high-speed coupled interconnects with intermediate ground insertion. , 2010, , .                                 |     | 1         |
| 85 | State-based full predication for low power coarse-grained reconfigurable architecture. , 2012, , .                                                                             |     | 1         |
| 86 | Hybrid spiking-stochastic Deep Neural Network. , 2017, , .                                                                                                                     |     | 1         |
| 87 | Optimal mapping of program overlays onto many-core platforms with limited memory capacity. Design Automation for Embedded Systems, 2017, 21, 173-194.                          | 0.7 | 1         |
| 88 | Energy Efficient Analog Synapse/Neuron Circuit for Binarized Neural Networks. , 2018, , .                                                                                      |     | 1         |
| 89 | Tapered-Ratio Compression for Residual Network. , 2018, , .                                                                                                                    |     | 1         |
| 90 | Power-conscious High Level Synthesis Using Loop Folding. , 0, , .                                                                                                              |     | 0         |

| #   | Article                                                                                                                                                                                               | IF  | CITATIONS |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 91  | Enhancing schedulability of hard real-time systems through codesign. , 0, , .                                                                                                                         |     | O         |
| 92  | Design verification by concurrent simulation and automatic comparison. , 0, , .                                                                                                                       |     | 0         |
| 93  | Rate assignment for embedded reactive real-time systems. , 0, , .                                                                                                                                     |     | 0         |
| 94  | Hardware synthesis for stack type partitioned-bus architecture., 0, , .                                                                                                                               |     | 0         |
| 95  | Performance-driven scheduling with bit-level chaining. , 0, , .                                                                                                                                       |     | 0         |
| 96  | Power minimization of functional units by partially guarded computation., 2000,,.                                                                                                                     |     | 0         |
| 97  | Fast hardware-software coverification by optimistic execution of real processor. , 0, , .                                                                                                             |     | 0         |
| 98  | Behavior-to-placed RTL synthesis with performance-driven placement. , $0$ , , .                                                                                                                       |     | 0         |
| 99  | QoS-aware dynamic power management for coarse-grained reconfigurable architecture. , 2009, , .                                                                                                        |     | 0         |
| 100 | ESL Design Methodology. Journal of Electrical and Computer Engineering, 2012, 2012, 1-2.                                                                                                              | 0.6 | 0         |
| 101 | Guest Editorial New Interconnect Technologies in On-Chip Communication. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2012, 2, 121-123.                                       | 2.7 | 0         |
| 102 | Memory-aware mapping and scheduling of tasks and communications on many-core SoC., 2012,,.                                                                                                            |     | 0         |
| 103 | A Memetic Quantum-Inspired Evolutionary Algorithm for circuit bipartitioning problem. , 2012, , .                                                                                                     |     | 0         |
| 104 | Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches. , 2013, , .                                                                                              |     | 0         |
| 105 | Deflection routing in 3D Network-on-Chip with TSV serialization. , 2013, , .                                                                                                                          |     | 0         |
| 106 | Guest Editorial for Special Issue on Emerging Memory Technologies—Modeling, Design, and Applications for Multi-Scale Computing. IEEE Transactions on Multi-Scale Computing Systems, 2015, 1, 125-126. | 2.5 | 0         |
| 107 | Dynamic error tracking and supply voltage adjustment for low power. , 2015, , .                                                                                                                       |     | 0         |
| 108 | A new approach to binarizing neural networks. , 2016, , .                                                                                                                                             |     | 0         |

| #   | Article                                                                                            | IF | Citations |
|-----|----------------------------------------------------------------------------------------------------|----|-----------|
| 109 | A new stochastic mutiplier for deep neural networks. , 2017, , .                                   |    | O         |
| 110 | Reconfigurable Multi-Input Adder Design for Deep Neural Network Accelerators. , 2018, , .          |    | O         |
| 111 | Speaker Verification based on Deep Neural Network for Text-Constrained Short Commands. , 2018, , . |    | O         |
| 112 | An RRAM-based Analog Neuron Design for the Weighted Spiking Neural network. , 2019, , .            |    | 0         |
| 113 | A new cost model for high-level power optimization and its application. , 0, , .                   |    | 0         |
| 114 | Power minimization of functional units by partially guarded computation. , 0, , .                  |    | 0         |
| 115 | Low power self-timed radix-2 division. , 0, , .                                                    |    | 0         |